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NOVEL HUMAN PROTEINS, POLYNUCLEOTIDES ENCODING 
THEM AND METHODS OF USING THE SAME 



FIELD OF THE INVENTION 

5 The present invention relates to nucleic acids encoding proteins that are new 

members of the following protein families: nuclear protein-like proteins, transforming 
acidic coiled-coil-containing protein-like proteins, thyroid hormone receptor interactor 6- 
like proteins, uroporphyrinogen-III synthase-like proteins, intracellular-like proteins, JLIM 
domain transcription factor-like proteins, voltage-dependent-calcium channel-like proteins, 

10 dihydropyridine-sensitive 1 -type-calcium channel-like proteins, beta-3-subunit-like 
proteins, nucleoporin-like proteins, BHLH protein DEC2-like proteins, kerain 18-like 
proteins, intracellular protein-like proteins, intracellular protein Tubby-like proteins, 
symplekin-like proteins, telethonin-Iike proteins, forkhead protein 03A-like proteins, 
cytochrome C-like proteins, troponin t-like proteins, XIN-like proteins, prostatic binding 

1 5 protein-like proteins, cyloplasmic protein like homo sapiens-like proteins, zinc-finger 
protein HZFI-like proteins, B4-2-like proteins, Maternal effect protein staufen-like 
proteins, desmin like homo sapiens-like proteins, hypothetical protein-like proteins, 
tropomysosin alpha chain-like proteins, hermansky-pudlak syndrome-like proteins, 
NOT2P-like proteins, human selenium-binding-like proteins, EH domain-binding mitotic 

20 phosphoprotein-like proteins, hypothetical intracellular-like proteins, MHC class 1 region 
proline rich protein-like proteints, nebullin-like proteins, golgi matrix protein GM130-like 
proteins, microspherule protein 1 -I ike proteins, AK016419 mus musculus adult male testis 
cDNA-Iike proteins, utrophin (dystrophin-related protein l)-like proteins, TPR domain-like 
proteins, LRR domain containing like homo sapiens-like proteins, G-rich sequence factor- 

25 1 -like proteins, cytoplasmic protein-like proteins, meningioma-expressed antigen 6/1 1 
(MEA6) (MEA1 l)-like proteins, ancient conserved domain protein 1 -like proteins, 
CDCRL-like proteins, HPRP-like proteins, PIBFI protein-like proteins, cytoplasmic 
protein-like proteins, zinc-flnger/KRAB domain containing protein-like proteins, RHO- 
lnteractin Protein 3-like proteins, cardian-troponin Mike protein, guanine nucleotide- 

30 binding protein-like proteins, benzodiazpine receptor (BZRP) like homo sapiens-like 

proteins, ankyrin-repeat containing protein like homo sapiens-like proteins, acyltransferase 
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like homo sapiens-like proteins, GTP-binding-protein SARI A like homo sapiens-like 
proteins, CGI-27 like homo sapiens-like proteins, FLJ20565 like homo sapiens-like 
proteins, 24100I4PO7RI K like homo sapiens-like proteins, multidomain presynaptic 
cytomatrix protein piccola like homo sapiens-like proteins, cytosolic-sorting protein PACS- 
5 1 A-like-like proteins, formin 2 like homo sapiens-like proteins, novel 5 : nucleotidase-like 
protein-like proteins, WW domain containing protein like homo sapiens-like proteins, 
gasdermin like homo sapiens-like proteins, Tubby super-family protein splice variant like 
homo sapiens-like proteins, synaptotagmin-like protein 3-A like homo sapiens-like 
proteins, copine I like homo sapiens-like proteins, selenoprotein XI like homo sapiens-like 

10 proteins, hypothetical WD-repeat like homo sapiens-like proteins, cytoplasmic protein-like 
proteins, TNFAIP I -like proteins, ribosomal protein L29-like proteins, paraneoplastic 
antigen-like proteins, GTF21RD2 like homo sapiens-like proteins, glycolipid transfer 
protein-like proteins, novel copine VIMike proteins, sperm membrane protein BS-63-like 
proteins, FIP-2-like proteins, PEXIO-Iike proteins. 

1 5 Included in the invention are polynucleotides and the polypeptides encoded by such 

polynucleotides, as well as vectors, host cells, antibodies and recombinant methods for 
producing the polypeptides and polynucleotides, as well as methods for using the same. 
Methods of use encompass diagnostic and prognostic assay procedures as well as methods 
of treating diverse pathological conditions. 

20 
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BACKGROUND OF THE INVENTION 



The invention generally relates to nucleic acids and polypeptides encoded therefrom. 
More specifically, the invention relates to nucleic acids encoding cytoplasmic, nuclear, 
membrane bound, and secreted polypeptides, as well as vectors, host cells, antibodies, and 
5 recombinant methods for producing these nucleic acids and polypeptides. 



The present invention is based in part on nucleic acids encoding proteins that are 
members of the following protein families: nuclear protein-like proteins, transforming acidic 
coiled-coil-containing protein-like proteins, thyroid hormone receptor interactor 6-like 

10 proteins, uroporphyrinogen-! 1 1 synthase-Iike proteins, intracellular-like proteins, L1M domain 
transcription factor-like proteins, voltage-dependent-calcium channel-like proteins, 
dihydropyridine-sensitive 1 -type-calcium channel-like proteins, beta-3-subunit-like proteins, 
nucleoporin-like proteins, BHLH protein DEC2-like proteins, kerain 18-like proteins, 
intracellular protein-like proteins, intracellular protein Tubby-like proteins, symplekin-Iike 

15 proteins, telethonin-like proteins, forkhead protein 03A-like proteins, cytochrome C-like 
proteins, troponin t-like proteins, XIN-like proteins, prostatic binding protein-like proteins, 
cyloplasmic protein like homo sapiens-like proteins, zinc-finger protein HZFI-like proteins, 
B4-2-like proteins, Maternal effect protein staufen-like proteins, desmin like homo sapiens- 
like proteins, hypothetical protein-like proteins, tropomysosin alpha chain-like proteins, 

20 hermansky-pudlak syndrome-like proteins, NOT2P-like proteins, human selenium-binding- 
like proteins, EH domain-binding mitotic phosphoprotein-like proteins, hypothetical 
intracellular-like proteins, MHC class 1 region proline rich protein-like proteints, nebullin- 
like proteins, golgi matrix protein GM130-like proteins, microspherule protein I -like 
proteins, AK0I64I9 mus musculus adult male testis cDNA-like proteins, utrophin 

25 (dystrophin-related protein l)-like proteins, TPR domain-like proteins, LRR domain 
containing like homo sapiens-like proteins, G-rich sequence factor- 1 -like proteins, 
cytoplasmic protein-like proteins, meningioma-expressed antigen 6/1 1 (MEA6) (ME;AI 1)- 
like proteins, ancient conserved domain protein 1 -like proteins, CDCRL-like proteins, 
HPRP-like proteins, PIBF1 protein-like proteins, cytoplasmic protein-like proteins, zinc- 

30 flnger/KRAB domain containing protein-like proteins, RHO-Interactin Protein 3-like 

proteins, cardian-troponin Mike protein, guanine nucleotide-binding protein-like proteins, 
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benzodiazpine receptor (BZRP) like homo sapiens-like proteins, ankyrin-repeat containing 
protein like homo sapiens-like proteins, acyltransferase like homo sapiens-like proteins, GTP- 
binding-protein SARI A like homo sapiens-like proteins, CGI-27 like homo sapiens-like 
proteins, FLJ20565 like homo sapiens-like proteins, 24I0014PO7RI K like homo sapiens-like 
5 proteins, multidomain presynaptic cytomatrix protein piccola like homo sapiens-like proteins, 
cytosolic-sorting protein PACS-I A-like-like proteins, formin 2 like homo sapiens-like 
proteins, novel 5' nucleotidase-like protein-like proteins, WW domain containing protein like 
homo sapiens-like proteins, gasdermin like homo sapiens-like proteins, Tubby super-family 
protein splice variant like homo sapiens-like proteins, synaptotagmin-like protein 3-A like 

10 homo sapiens-like proteins, copine I like homo sapiens-like proteins, selenoprotein XI like 
homo sapiens-like proteins, hypothetical WD-repeat like homo sapiens-like proteins, 
cytoplasmic protein-like proteins, TNFAIP 1-1 ike proteins, ribosomal protein L29-like 
proteins, paraneoplastic antigen-like proteins, GTF21RD2 like homo sapiens-like proteins, 
glycolipid transfer protein-like proteins, novel copine Vll-like proteins, sperm membrane 

15 protein BS-63-like proteins, FIP-2-like proteins, PEXIO-Iike proteins. The novel 
polynucleotides and polypeptides are referred to herein as NOV I a, NOV2a, NOV2b, 
NOV3a, NOV3b, NOV4a, NOV4b, NOV5a, NOV5b, NOV6a, NOV6b, NOV7a, NOV7b, 
NOV7c, NOV7d, NOV7e, NOV8a, NOV9a, NOV9b, NOVlOa, NOV 10b, NOV1 la, 
NOV 1 2a, NOV 1 2b, NOV 1 3a, NOV1 4a, NOV 1 5a, NO V 1 5b, NOV 1 6a, NOV 1 7a, NOV 1 8a, 

20 NOV 18b, NOV 18c, NOV 19a, NOV19b, NOV20a, NOV20b, NOV20c, NOV20d, NOV20e, 
NOV20f, NOV20g, NOV21a, NOV21b, NOV22a, NOV23a, NOV23b, NOV24a, NOV25a, 
NOV25b, NOV26a, NOV26b, NOV26c, NOV27a, NOV27b, NOV28a, NOV28b, NOV28c, 
NOV28d, NOV28e, NOV28f, NOV29a, NOV29b, NOV30a, NOV31a, NOV32a, NOV32b, 
NOV33a, NOV34a, NOV35a, NOV35b, NOV35c, NOV36a, NOV36b, NOV37a, NOV37b, 

25 NOV37c, NOV38a, NOV39a, NOV40a, NOV4 1 a, NOV42a, NOV42b, NOV43a, NOV44a, 
NOV44b, NOV44c, NOV45a, NOV46a, NOV47a, NOV48a, NOV48b, NOV49a, NOV49b, 
NOV50a, NOV50b, NOV50c, NOV51a, NOV52a, NOV52b, NOV53a, NOV54a, NOV54b, 
NOV55a, NOV55b, NOV55c, NOV55d, NOV55e, NOV56a, NOV57a, NOV58a, NOV59a, 
NOV60a, NOV61a, NOV62a, NOV62b, NOV63a, NOV63b, NOV64a, NOV65a, NOV66a, 

30 NOV66b, NOV67a, NOV68a, NO V69a, NOV70a, NO V7 1 a, NOV72a, NOV72b, NO V72c, 
NOV73a, NOV73b, NOV74a, NOV75a, NOV76a, NOV77a, NOV78a, NOV79a, NOV80a, 
NOV81a, NOV81b, NOV82a, NOV82b, NOV82c, NOV83a, NOV83b, NOV84a, NOV84b, 
NOV84C, NOV85a, NOV85b, NOV86a, NOV86b, NOV87a, NOV87b, NOV87c, NOV87d, 
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NOV87e, NOV88a, NOV88b, NOV89a, NOV89b, NOV90a, NOV90b, NOV9la. NOV9!b, 
NOV91c, NOV9Id, NOV92a, NOV92b, NOV92c and NOV92d. These nucleic acids and 
polypeptides, as well as derivatives, homologs, analogs and fragments thereof, will 
hereinafter be collectively designated as "NOVX" nucleic acid or polypeptide sequences. 
5 In one aspect, the invention provides an isolated NOVX nucleic acid disclosed in SEQ 

IDNO:2n-l, wherein n is an integer between 1 and 172. In some embodiments, the NOVX 
nucleic acid molecule will hybridize under stringent conditions to a nucleic acid sequence 
complementary to a nucleic acid molecule that includes a protein-coding sequence of a 
NOVX nucleic acid sequence. The invention also includes an isolated nucleic acid that 

1 0 encodes a NOVX polypeptide, or a fragment, homolog, analog or derivative thereof. For 
example, the nucleic acid can encode a polypeptide at least 80% identical to a polypeptide 
comprising the amino acid sequences of SEQ ID NO:2n, wherein n is an integer between I 
and 1 72. The nucleic acid can be, for example, a genomic DNA fragment or a cDN A 
molecule that includes the nucleic acid sequence of any of SEQ ID NO:2n- 1 , wherein n is an 

15 integer between 1 and 172. Also included in the invention is an oligonucleotide, e.g., an 
oligonucleotide which includes at least 6 contiguous nucleotides of a NOVX nucleic acid 
(e.g., SEQ ID NO:2n-l, wherein n is an integer between 1 and 172) or a complement of said 
oligonucleotide. 

The invention also encompasses isolated NOVX polypeptides (SEQ ID NO:2n, 
20 wherein n is an integer between 1 and 1 72). In certain embodiments, the NOVX 

polypeptides include an amino acid sequence that is substantially identical to the amino acid 
sequence of a human NOVX polypeptide. 

The invention also features antibodies that immunoselectively bind to NOVX 
polypeptides, or fragments, homologs, analogs or derivatives thereof. 
25 In another aspect, the invention includes pharmaceutical compositions that include 

therapeutically- or prophylactically-effective amounts of a therapeutic and a 
pharmaceutical ly-acceptable carrier. The therapeutic can be, e.g., a NOVX nucleic acid, a 
NOVX polypeptide, or an antibody specific for a NOVX polypeptide. In a further aspect, the 
invention includes, in one or more containers, a therapeutically- or prophylactically-effective 
30 amount of this pharmaceutical composition. 

In a further aspect, the invention includes a method of producing a polypeptide by 
culturing a cell that includes a NOVX nucleic acid, under conditions allowing for expression 
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of the NOVX polypeptide encoded by the DNA. If desired, the NOVX polypeptide can then 
be recovered. 

In another aspect, the invention includes a method of detecting the presence of a 
NOVX polypeptide in a sample. In the method, a sample is contacted with a compound that 
5 selectively binds to the polypeptide under conditions allowing for formation of a complex 
between the polypeptide and the compound. The complex is detected, if present, thereby 
identifying the NOVX polypeptide within the sample. 

The invention also includes methods to identify specific cell or tissue types based on 
their expression of a NOVX. 

10 Also included in the invention is a method of detecting the presence of a NOVX 

nucleic acid molecule in a sample by contacting the sample with a NOVX nucleic acid probe 
or primer, and detecting whether the nucleic acid probe or primer bound to a NOVX nucleic 
acid molecule in the sample. 

In a further aspect, the invention provides a method for modulating the activity of a 

15 NOVX polypeptide by contacting a cell sample that includes the NOVX polypeptide with a 
compound that binds to the NOVX polypeptide in an amount sufficient to modulate the 
activity of said polypeptide. The compound can be, e.g., a small molecule, such as a nucleic 
acid, peptide, polypeptide, peptidomimetic, carbohydrate, lipid or other organic (carbon 
containing) or inorganic molecule, as further described herein. 

20 In another embodiment, the invention involves a method for identifying a potential 

therapeutic agent for use in treatment of a pathology, wherein the pathology is related to 
aberrant expression or aberrant physiological interactions of a polypeptide with an amino acid 
sequence selected from the group consisting of SEQ ID NO:2n, wherein n is an integer 
between I and 172, the method including providing a cell expressing the polypeptide of the 

25 invention and having a property or function ascribable to the polypeptide; contacting the cell 
with a composition comprising a candidate substance; and determining whether the substance 
alters the property or function ascribable to the polypeptide; whereby, if an alteration 
observed in the presence of the substance is not observed when the cell is contacted with a 
composition devoid of the substance, the substance is identified as a potential therapeutic 

30 agent. 

Also within the scope of the invention is the use of a therapeutic in the manufacture of 
a medicament for treating or preventing disorders or syndromes including, e.g., 
adrenoleukodystrophy, congenital adrenal hyperplasia, hemophilia, hypercoagulation, 
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hypogonadism, idiopathic thrombocytopenic purpura, autoimmune disease,inflammatory 
bowel disease (1BD), rheumatoid arthritis, osteoarthritis, psoriasis, allergies, asthma, 
immunodeficiencies, Von Hippel-Lindau (VHL) syndrome, Alzheimer's disease, stroke, 
tuberous sclerosis, hypercalcemia, Parkinson's disease, Huntington's disease, cerebral palsy, 
5 epilepsy, Lesch-Nyhan syndrome, multiple sclerosis, schizophrenia, depression, 

ataxia-telangiectasia, leukodystrophies, behavioral disorders, addiction, anxiety, pain, 
obesity, diabetes, renal artery stenosis, interstitial nephritis, glomerulonephritis, polycystic 
kidney disease, systemic lupus erythematosus, renal tubular acidosis, IgA nephropathy, 
emphysema, scleroderma, adult respiratory distress syndrome (ARDS), lymphedema, graft 

1 0 versus host disease (GVHD), pancreatitis, ulcers, anemia, ataxia-telangiectasia, cancer, 
trauma, viral infections, bacterial infections, parasitic infections; and conditions related to 
transplantation, neuroprotection, fertility, or regeneration (in vitro and in vivo) and/or other 
pathologies and disorders of the like. Also within the scope of the invention is the use of a 
therapeutic in the manufacture of a medicament for treating or preventing conditions 

1 5 including, e.g., those associated with homologs of a NOVX sequence, such as those listed in 
Table A. 

The therapeutic can be, e.g., a NOVX nucleic acid, a NOVX polypeptide, or a 
NOVX-speciflc antibody, or biologically-active derivatives or fragments thereof. 

For example, the compositions of the present invention will have efficacy for 

20 treatment of patients suffering from the diseases and disorders disclosed above and/or other 
pathologies and disorders of the like. The polypeptides can be used as immunogens to 
produce antibodies specific for the invention, and as vaccines. They can also be used to 
screen for potential agonist and antagonist compounds. For example, a cDNA encoding 
NOVX may be useful in gene therapy, and NOVX may be useful when administered to a 

25 subject in need thereof. 

The invention further includes a method for screening for a modulator of disorders or 
syndromes including, e.g., the diseases and disorders disclosed above and/or other 
pathologies and disorders of the like. The method includes contacting a test compound with a 
NOVX polypeptide and determining if the test compound binds to said NOVX polypeptide. 

30 Binding of the test compound to the NOVX polypeptide indicates the test compound is a 
modulator of activity, or of latency or predisposition to the aforementioned disorders or 
syndromes. 
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Also within the scope of the invention is a method for screening for a modulator of 
activity, or of latency or predisposition to disorders or syndromes including, e.g., the diseases 
and disorders disclosed above and/or other pathologies and disorders of the like by 
administering a test compound to a test animal at increased risk for the aforementioned 
5 disorders or syndromes. The test animal expresses a recombinant polypeptide encoded by a 
NOVX nucleic acid. Expression or activity of NOVX polypeptide is then measured in the 
test animal, as is expression or activity of the protein in a control animal which 
recombinantly-expresses NOVX polypeptide and is not at increased risk for the disorder or 
syndrome. Next, the expression of NOVX polypeptide in both the test animal and the control 

10 animal is compared. A change in the activity of NOVX polypeptide in the test animal 
relative to the control animal indicates the test compound is a modulator of latency of the 
disorder or syndrome. 

In yet another aspect, the invention includes a method for determining the presence of 
or predisposition to a disease associated with altered levels of a NOVX polypeptide, a" rJOVX 

1 5 nucleic acid, or both, in a subject {e.g., a human subject). The method includes measuring the 
amount of the NOVX polypeptide in a test sample from the subject and comparing the 
amount of the polypeptide in the test sample to the amount of the NOVX polypeptide present 
in a control sample. An alteration in the level of the NOVX polypeptide in the test sample as 
compared to the control sample indicates the presence of or predisposition to a disease in the 

20 subject. Preferably, the predisposition includes, e.g., the diseases and disorders disclosed 
above and/or other pathologies and disorders of the like. Also, the expression levels of the 
new polypeptides of the invention can be used in a method to screen for various cancers as 
well as to determine the stage of cancers. 

In a further aspect, the invention includes a method of treating or preventing a 

25 pathological condition associated with a disorder in a mammal by administering to the 
subject a NOVX polypeptide, a NOVX nucleic acid, or a NOVX-specific antibody to a 
subject (e.g., a human subject), in an amount sufficient to alleviate or prevent the pathological 
condition. In preferred embodiments, the disorder, includes, e.g., the diseases and disorders 
disclosed above and/or other pathologies and disorders of the like. 

30 In yet another aspect, the invention can be used in a method to identity the cellular 

receptors and downstream effectors of the invention by any one of a number of techniques 
commonly employed in the art. These include but are not limited to the two-hybrid system, 
affinity purification, co-precipitation with antibodies or other specific-interacting molecules. 
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NOVX nucleic acids and polypeptides are further useful in the generation of 
antibodies that bind immuno-specifically to the novel NOVX substances for use in 
therapeutic or diagnostic methods. These NOVX antibodies may be generated according to 
methods known in the art, using prediction from hydrophobicity charts, as described in the 
5 "Anti-NOVX Antibodies" section below. The disclosed NOVX proteins have multiple 

hydrophilic regions, each of which can be used as an immunogen. These NOVX proteins can 
be used in assay systems for functional analysis of various human disorders, which will help 
in understanding of pathology of the disease and development of new drug targets for various 
disorders. 

10 The NOVX nucleic acids and proteins identified here may be useful in potential 

therapeutic applications implicated in (but not limited to) various pathologies and disorders as 
indicated below. The potential therapeutic applications for this invention include, but are not 
limited to: protein therapeutic, small molecule drug target, antibody target (therapeutic, 
diagnostic, drug targeting/cytotoxic antibody), diagnostic and/or prognostic marker, gene 

15 therapy (gene delivery/gene ablation), research tools, tissue regeneration in vivo and in vitro 
of all tissues and cell types composing (but not limited to) those defined here. 

Unless otherwise defined, all technical and scientific terms used herein have the same 
meaning as commonly understood by one of ordinary skill in the art to which this invention 
belongs. Although methods and materials similar or equivalent to those described herein can 

20 be used in the practice or testing of the present invention, suitable methods and materials are 
described below. All publications, patent applications, patents, and other references 
mentioned herein are incorporated by reference in their entirety. In the case of conflict, the 
present specification, including definitions, will control. In addition, the materials, methods, 
and examples are illustrative only and not intended to be limiting. 

25 Other features and advantages of the invention will be apparent from the following 

detailed description and claims. 

DETAILED DESCRIPTION OF THE INVENTION 

The present invention provides novel nucleotides and polypeptides encoded thereby. 
Included in the invention are the novel nucleic acid sequences, their encoded polypeptides, 
30 antibodies, and other related compounds. The sequences are collectively referred to herein as 
"NOVX nucleic acids" or "NOVX polynucleotides" and the corresponding encoded 
polypeptides are referred to as "NOVX polypeptides" or "NOVX proteins." Unless indicated 
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otherwise, "NOVX" is meant to refer to any of the novel sequences disclosed herein. Table 
A provides a summary of the NOVX nucleic acids and their encoded polypeptides. 

TABLE A. Sequences and Corresponding SEQ ID Numbers 



NOVX 
Assignment 


Internal 

111 IWI 1 lul 

Identification 


SEO ID NO 
(nucleic acid) 


SEO ID NO 
(amino acid) 


Homology 


la 


CGI 01 036-01 


1 


2 


Nuclear Protein 


2a 


CG10 1055-01 


3 


4 


TVsmcfrvrm in rr s»r*irlir > rr»ilA/H 
1 l a\ IblUI I il Ul£> dLIUlL LU1ICU- 

coil-containing protein 


2b 


CG 101055-02 


5 


6 




3a 


CG101973-01 


7 


8 


i nyroiQ norrnone rvecepior 
Interactor 6 


3b. 


CG101973-02 


9 


10 




4a 


CGI 02244-01 


11 


12 


Uroporphyrinogen-H I 


HQ 




i j 


l & 






CCi 1097^1 -01 




16 
1 O 


iiuraceiiuiar 


JU 


rni097iiJ^9 


17 


1 0 




6a 


CG 102975-01 


19 


20 


Lriivi L/omain i ranscripuon 




CGI 02975-02 


21 


22 




7a 


CGI 03764-01 


23 


24 


V Ullage UCpCIIUCl 11 CalLIUIIl 

channel nnhlir 


7b 


CGI 03764-01 


25 


26 


OihvHrnnvriHtnf-^enQitiv/p I _ 

l^llljrwi UL/jr 1 lulliv OGllolllVC 1—1 

type, Calcium Channel Beta-3 
Subunit-like Proteins 


7c 


212779035 


27 


28 


Dihydropyridine-Sensitive L- 
type, Calcium Channel Beta-3 
Subunit-like Proteins 


7d 


CG 103764-02 


29 


30 




7e 


CGI 03 764-03 


31 


32 




8a 


CG 104944-01 


33 


34 


Nucleoporin 


9a 


CGI 06550-01 


35 


36 




9b 


CGI 06550-02 


37 


38 


BHLH Protein DEC2 


10a 


CGI 06842-01 


39 


40 


Keratin 1 8 


10b 


CGI 06842-02 


41 


42 




1 la 


CG 107095-01 


43 


44 


Intracellular 


12a 


CGI 07477-01 


45 


46 




12b 


CGI 07477-02 


47 


48 


Intracellular protein 


13a 


CGI 08707-01 


49 


50 


Intracellular Protein Tubby 


14a 


CGI 08791 -01 


51 


52 


Symplekin 


15a 


CGI 09247-01 


53 


53 




15b 


CGI 09247-02 


55 


56 


Telethonin 


16a 


CGI 10410-01 


57 


58 


Forkhead Protein 03A 


17a 


CGI 10882-01 


59 


60 


Cytochrome C 


18a 


CGI 11 188-01 


61 


62 


kinectin 


18b 


CGI 11 188-02 


63 


64 




18c 


CGI 11 188-03 


65 


66 




19a 


CGI 11473-01 


67 


68 


troponin T-like protein 


19b 


CGI 11473-02 


69 


70 




20a 


CGI 11501-01 


71 


72 


XIN 
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20b 


CGI 11501-02 


73 


74 


XIN 


20c 


249257832 


75 


76 


XI1M 


20d 


249263153 


77 


78 


XIN 


20e 


249263166 


79 


80 


XIN 


20f 


249263170 


81 


82 


XIN 


20g 


CGI 11501-03 


83 


84 




21a 


CGI 12595-01 


85 


86 


Prostatic Binding Protein 


21b 


CGI 12595-02 


87 


88 




zza 


pp. i \ *)£s>a m 


SO 


Oft 


Cytoplasmic Protein like homo 
sapiens 


23 a 


CGI 13823-01 


91 


92 


Zinc Finger Protein HZF1 


23b 


CGI 13823-02 


93 


94 




24a 


CGI 14098-01 


95 


96 


B4-2 




TO! 1430R-01 


97 


98 


Maternal Effect Protein 
Staufen 


25b 


CGI 14308-02 


99 


100 




26a 


CGI 14349-01 


101 


102 




26b 


CGI 14349-02 


103 


104 


desmin like homo sapiens 


26c 


CGI 14349-03 


105 


106 




27a 


CGI 14503-01 


107 


108 


hypothetical protein 


27b 


CGI 14503-02 


109 


110 




28a 


CGI 14588-01 


111 


112 


Tropomyosin Alpha Chain 


28b 


CGI 14588-02 


113 


114 




28c 


CGI 14588-03 


115 


116 




28d 


CGI 14588-04 


117 


118 




28e 


CGI 14588-05 


119 


120 




28 f 


CGI 14588-06 


121 


122 




29a 


CGI 14621-01 


123 


124 


Hermansky-Pudlak Syndrome 


29b 


CGI 14621-02 


125 


126 




30a 


CGI 14649-01 


127 


128 


NOT2P 


31a 


CGI 16785-01 


129 


130 


Human selenium-binding 


32a 


CGI 18927-01 


131 


132 






CC, 1 1 S077 CO 


i j j 


i id 

1 J4 


EH Domain-Binding Mitotic 
Phosphprotein 


33a 


CGI 18981-01 


135 


136 


Hypothetical Intracellular 


34a 


CGI 19385-01 


137 


138 


MHC class I region proline 
rich protein 


35a 


CGI 19566-01 


139 


140 


Nebullin 


35b 


CGI 19566-02 


141 


142 




35c 


CGI 19566-03 


143 


144 




36a 


CGI 20 166-01 


145 


146 




36b 


CGI 20 166-02 


147 


148 


Golgi matrix protein GM130 


37a 


CGI 20401 -01 


149 


150 




37b 


CGI 20401 -02 


151 


152 


Microspherule Protein 1 


37c 


CGI 2040 1-03 


153 


154 




38a 


CGI 22 125-01 


155 


156 


AK016419 Mus musculus 
adult male testis cDNA 


39a 


CGI 22 195-01 


157 


158 


Utrophin (Dystrophin-Related 
Protein 1) 


40a 


CGI 22738-01 


159 


160 


TPR domain 


41a 


CG123451-0I 


161 


162 


Intracellular Protein 
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v~Kj 1 ZJOOU-U I 


1 AT 


1 </t 
1 OH 


Transforming Acidic Coilecl- 
Coil-ContainingJVotein 2 


42b 


CGI 23660-02 


165 


166 




43a 


CGI 23955-01 


167 


168 


Zinc finger protein 


44a 


CGI 24672-01 


169 


170 




44b 


CGI 24672-03 


171 


172 


LRR Domain Containing like 
homo sapiens 


44C 


HjIz40/z-Uz 


1 11 

1 15 


1 "7/1 


LRR Domain Containing like 
homo sapiens 


45a 


CGI 25900-01 


175 


176 


G-Rich Sequence Factor- 1 


46a 


CGI 265 10-01 


177 


178 


Cytoplasmic protein 


47a 


CGI 27 106-01 


179 


180 


Meningioma-Expressed 
Antigen 6/1 1 (MEA6) 
(MEA11) 


48a 


CGI 27340-01 


181 


182 




48b 


CGI 27340-02 


183 


184 


Ancient Conserved Domain 
Protein 1 


49a 


CGI 283 10-01 


185 


186 


CDCRL 


49b 


CGI 283 10-02 


187 


188 




50a 


CGI 28369-01 


189 


190 


HPRP 


50b 


CGI 28369-02 


191 


192 




50c 


CGI 28369-03 


193 


194 




51a 


CGI 28420-01 


195 


196 


PIBF1 protein 


52a 


CGI 285 19-01 


197 


198 


Cytoplasmic protein 


52b 


CGI 285 19-02 


199 


200 




53a 


CGI 28626-01 


201 


202 


Zinc Finger / KRAB domain 
containing Protein 


54a 


CGI 28852-01 


203 


204 


RHO-Interacting Protein 3 


54b 


CGI 28852-02 


205 


206 




55a 


CGI 32650-01 


207 


208 




55b 


CGI 32650-05 


209 


210 


cardiac troponin I 


55c 


CGI 32650-02 


211 


212 




55d 


CGI 32650-03 


213 


214 




55e 


CGI 32650-04 


215 


216 




56a 


CGI 33808-01 


217 


218 




57a 


CG 136288-01 


219 


220 


Guanine Nucleotide-Binding 
Protein Gamma-7 Subunit like 
homo sapiens 


CD, 

joa 




i 

ll\ 


ILL 


2410017P07RIKIike homo 
sapiens 


59a 


CGI 36942-01 


223 


224 


FLJ20565 like homo sapiens 


60a 


CGI37017-01 


225 


226 


CGI-27 like homo sapiens 


61a 


CGI37I46-0I 


227 


228 


GTP-Binding Protein SARI A 
like homo sapiens 


62a 


CGI 3 7566-01 


229 


230 


Acyltransferase like homo 
sapiens 


62b 


CGI 37566-02 


231 


232 




63a 


CGI 37707-01 


233 


234 


Benzodiazpine receptor 
(BZRP) like homo sapiens 


63 b 


CGI 37707-02 


235 


236 




64a 


CGI 38033-01 


237 


238 


Ankyrin-repeat containing 
protein like homo sapiens 
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65a 


CGI 38043-01 


239 


240 


Muitidomain Presynaptic 
Cytomatrix Protein Piccolo 
like homo sapiens 


66a 


CGI 38208-01 


241 


242 


Cytosolic Soiling Protein 
PACS-IA-Iike 


66b 


CGI 38208-02 


243 


244 


Cytosolic Sorting Protein 
PACS-IA-Iike 


67a 


CGI 38303-01 


245 


246 


Formin 2 like homo sapiens 


68a 


CGI 38362-01 


247 


248 


Novel 5' nucleotidase-like 
Proteins 


69a 


CGI 38452-01 


249 


250 


Novel Intracellular F-box 
domain containing protein-like 
Proteins and Nucleic Acids 
Encoding Same 


70a 


CGI 3878 1-01 


251 


252 


Novel HTPHLP Gene Like- 
like Proteins and Nucleic 
Acids Encoding Same 


71a 


CGI 38808-01 


253 


254 


Novel S1F AND TIAM I -Like 
Exchange Factor-like Proteins 
and Nucleic Acids Encoding 
Same 


/za 






256 


WW domain containing 
protein like homo sapiens 


72b 


CGI 39224-02 


257 


258 




72c 


CGI 39224-03 


259 


260 




73a 


CGI 40088-01 


261 


262 


Gasdermin like homo sapiens 


73b 


CGI 40088-02 


263 


264 




74a 


CGI 40 170-01 


265 


266 


Tubby Super-Family Protein 
Splice Variant like homo 
sapiens 


/ DSi 


PP 1 ACi 1 70 n 1 


7A7 
ZD/ 


ZOO 


Synaptotagmin-Like Protein 3- 
A like homo sapiens 


/oa 




7 AO 

zoy 


inn 
Z /U 


Hypothetical Intracellular like 
homo sapiens 


77a 


CGI 40727-01 


271 


272 


Copine I like homo sapiens 


/ od 


PP. 1 4 1 fi7fLfi t 


771 
Z /.> 


11 A 
Z /4 


Selenoprotein X 1 like homo 
sapiens 


70n 
/ yc\ 




77^ 
Z / J 


77A 

z to 


Hypothetical WD-repeal like 
homo sapiens 


80a 


CG19I018-01 


277 


278 


cytoplasmic protein 


81a 


CG56I25-01 


279 


280 




81b 


CG56 125-02 


281 


282 


TNFAIP1 


82a 


CG57 113-01 


283 


284 




82b 


CG57 113-03 


285 


286 


Ribosomal Protein L29 


82c 


CG57 113-02 


287 


288 




83a 


CG59536-01 


289 


290 




83b 


CG59536-02 


291 


292 


Paraneoplastic Antigen 


84a 


CG59794-01 


293 


294 


GTF21RD2 like homo sapiens 


84b 


CG59794-02 


295 


296 


GTF2IRD2 like homo sapiens 


84c 


CG59794-03 


297 


298 




85a 


CG59821-01 


299 


300 


Intracellular Protein like homo 
sapiens 
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85b 


CG59821-02 


301 


302 


Intracellular Protein like homo 
sapiens 


86a 


CG59849-01 


303 


304 




50D 


mA.Q9.AQ C\0 


JVJ 




Intracellular Protein like homo 
sapiens 


87a 


CG59920-01 


307 


308 


Glycolipid Transfer Protein 


87b 


CG59920-02 


309 


310 


Glycolipid Transfer Protein 


87c 


277583551 


311 / 


312 


Glycolipid Transfer Protein 


87d 


CG59920-0I 


313 


314 


Glycolipid Transfer Protein 


87e 


308559628 


315 


316 


Glycolipid Transfer Protein 


ooa 




1 1 1 


T 1 Q 
J 1 5 


Novel Copine Vll-like 
Proteins 


ooD 




1 1 0 

J 17 




Novel Copine Vll-like 
Proteins 


89a 


CG93335-01 


321 


322 




89b 


CG93335-02 


323 


324 


Intracellular protein 


90a 


CG94377-01 


325 


326 




90b 


CG94377-02 


327 


328 


Sperm Membrane Protein BS- 
63 


91a 


CG97090-01 


329 


330 




91b 


CG97090-04 


331 


332 


FIP-2 i 


91c 


CG97090-03 


333 


334 


FIP-2 


91d 


CG97090-02 


335 


336 




92a 


CG97966-01 


337 


338 




92b 


CG97966-03 


339 


340 


PEXI0 


92c 


CG97966-02 


341 


342 




92d 


CG97966-04 


343 


344 





Table A indicates the homology of NOV polypeptides to known protein families. 
Thus, the nucleic acids and polypeptides, antibodies and related compounds according to the 
invention corresponding to a NOVX as identified in column 1 of Table A will be useful in 
5 therapeutic and diagnostic applications implicated in, for example, pathologies and disorders 
associated with the known protein families identified in column 5 of Table A. 

Pathologies, diseases, disorders and condition and the like that are associated with 
NOVX sequences include, but are not limited to: e.g., cardiomyopathy, atherosclerosis, 
hypertension, congenital heart defects, aortic stenosis, atrial septal defect (ASD), 

1 0 atrioventricular (A-V) canal defect, ductus arteriosus, pulmonary stenosis, subaortic stenosis, 
ventricular septal defect (VSD), valve diseases, tuberous sclerosis, scleroderma, obesity, 
metabolic disturbances associated with obesity, transplantation, adrenoleukodystrophy, 
congenital adrenal hyperplasia, prostate cancer, pancreatic cancer, gastric cancer, colon 
cancer, liver cancer, renal cancer, breast cancer, ovarian cancer, prostate cancer, squamous 

1 5 cell carcinoma, melanoma, brain cancer, allergies, asthma, emphysema, inflammatory bowel 
disease, rheumatoid arthritis, osteoarthritis, lupus erythematosus, diabetes, metabolic 
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disorders, neoplasm; adenocarcinoma, lymphoma, uterus cancer, fertility, hemophilia, 
hypercoagulation, idiopathic thrombocytopenic purpura, immunodeficiencies, graft versus 
host disease, AIDS, bronchial asthma, Crohn's disease; multiple sclerosis, schizophrenia, 
depression, treatment of Albright Hereditary Osteodystrophy, infectious disease, anorexia, 
5 cancer-associated cachexia, cancer, neurodegenerative disorders, epilepsy, Alzheimer's 
Disease, Parkinson's Disorder, Huntigton's Disease, immune disorders, hematopoietic 
disorders, and the various dyslipidemias, the metabolic syndrome X and wasting disorders 
associated with chronic diseases and various cancers, as well as conditions such as 
transplantation and fertility. 

1 0 NOVX nucleic acids and their encoded polypeptides are useful in a variety of 

applications and contexts. The various NOVX nucleic acids and polypeptides according to 
the invention are useful as novel members of the protein families according to the presence of 
domains and sequence relatedness to previously described proteins. Additionally. NOVX 
nucleic acids and polypeptides can also be used to identify proteins that are members of the 

1 5 family to which the NOVX polypeptides belong. 

Consistent with other known members of the family of proteins, identified in column 
5 of Table A, the NOVX polypeptides of the present invention show homology to, and 
contain domains that are characteristic of, other members of such protein families. Details of 
the sequence relatedness and domain analysis for each NOVX are presented in Example A. 

20 The NOVX nucleic acids and polypeptides can also be used to screen for molecules, 

which inhibit or enhance NOVX activity or function. Specifically, the nucleic acids and 
polypeptides according to the invention may be used as targets for the identification of small 
molecules that modulate or inhibit diseases associated with the protein families listed in 
Table A. 

25 The NOVX nucleic acids and polypeptides are also useful for detecting specific cell 

types. Details of the expression analysis for each NOVX are presented in Example C. 
Accordingly, the NOVX nucleic acids, polypeptides, antibodies and related compounds 
according to the invention will have diagnostic and therapeutic applications in the detection 
of a variety of diseases with differential expression in normal vs. diseased tissues, e.g. 

30 detection of a variety of cancers. 

Additional utilities for NOVX nucleic acids and polypeptides according to the 
invention are disclosed herein. 
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NOVX clones 

NOVX nucleic acids and their encoded polypeptides are useful in a variety of 
applications and contexts. The various NOVX nucleic acids and polypeptides according to 
the invention are useful as novel members of the protein families according to the presence of 
5 domains and sequence relatedness to previously described proteins. Additionally, NOVX 
nucleic acids and polypeptides can also be used to identify proteins that are members of the 
family to which the NOVX polypeptides belong. 

The NOVX genes and their corresponding encoded proteins are useful for preventing, 
treating or ameliorating medical conditions, e.g., by protein or gene therapy. Pathological 
10 conditions can be diagnosed by determining the amount of the new protein in a sample or by 
determining the presence of mutations in the new genes. Specific uses are described for each 
of the NOVX genes, based on the tissues in which they are most highly expressed. Uses 
include developing products for the diagnosis or treatment of a variety of diseases and 
disorders. 

1 5 The NOVX nucleic acids and proteins of the invention are useful in potential 

diagnostic and therapeutic applications and as a research tool. These include serving as a 
specific or selective nucleic acid or protein diagnostic and/or prognostic marker, wherein the 
presence or amount of the nucleic acid or the protein are to be assessed, as well as potential 
therapeutic applications such as the following: (i) a protein therapeutic, (ii) a small molecule 

20 drug target, (iii) an antibody target (therapeutic, diagnostic, drug targeting/cytotoxic 

antibody), (iv) a nucleic acid useful in gene therapy (gene delivery/gene ablation), and (v) a 
composition promoting tissue regeneration in vitro and in vivo (vi) a biological defense 
weapon. 

In one specific embodiment, the invention includes an isolated polypeptide 
25 comprising an amino acid sequence selected from the group consisting of: (a) a mature form 
of the amino acid sequence selected from the group consisting of SEQ ID NO: 2n, wherein n 
is an integer between I and 172; (b) a variant of a mature form of the amino acid sequence 
selected from the group consisting of SEQ ID NO: 2n, wherein n is an integer between 1 and 
1 72, wherein any amino acid in the mature form is changed to a different amino acid, 
30 provided that no more than 15% of the amino acid residues in the sequence of the mature 
form are so changed; (c) an amino acid sequence selected from the group consisting of SEQ 
ID NO: 2n, wherein n is an integer between 1 and 1 72; (d) a variant of the amino acid 
sequence selected from the group consisting of SEQ ID NO:2n, wherein n is an integer 
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between I and 1 72 wherein any amino acid specified in the chosen sequence is changed to a 
different amino acid, provided that no more than 1 5% of the amino acid residues in the 
sequence are so changed; and (e) a fragment of any of (a) through (d). 

In another specific embodiment, the invention includes an isolated nucleic acid 
5 molecule comprising a nucleic acid sequence encoding a polypeptide comprising an amino 
acid sequence selected from the group consisting of: (a) a mature form of the amino acid 
sequence given SEQ ID NO: 2n, wherein n is an integer between 1 and 172; (b) a variant of a 
mature form of the amino acid sequence selected from the group consisting of SEQ ID NO: 
2n, wherein n is an integer between 1 and 172 wherein any amino acid in the mature form of 

10 the chosen sequence is changed to a different amino acid, provided that no more than 1 5% of 
the amino acid residues in the sequence of the mature form are so changed; (c) the amino acid 
sequence selected from the group consisting of SEQ ID NO: 2n, wherein n is an integer 
between 1 and 1 72; (d) a variant of the amino acid sequence selected from the group 
consisting of SEQ ID NO: 2n, wherein n is an integer between I and 172, in which any amino 

1 5 acid specified in the chosen sequence is changed to a different amino acid, provided that no 
more than 15% of the amino acid residues in the sequence are so changed; (e) a nucleic acid 
fragment encoding at least a portion of a polypeptide comprising the amino acid sequence 
selected from the group consisting of SEQ ID NO: 2n, wherein n is an integer between 1 and 
1 72 or any variant of said polypeptide wherein any amino acid of the chosen sequence is 

20 changed to a different amino acid, provided that no more than 10% of the amino acid residues 
in the sequence are so changed; and (f) the complement of any of said nucleic acid molecules. 

In yet another specific embodiment, the invention includes an isolated nucleic acid 
molecule, wherein said nucleic acid molecule comprises a nucleotide sequence selected from 
the group consisting of: (a) the nucleotide sequence selected from the group consisting of 

25 SEQ ID NO: 2n-l, wherein n is an integer between I and 172; (b) a nucleotide sequence 
wherein one or more nucleotides in the nucleotide sequence selected from the group 
consisting of SEQ ID NO: 2n-l, wherein n is an integer between 1 and 172 is changed from 
that selected from the group consisting of the chosen sequence to a different nucleotide 
provided that no more than 15% of the nucleotides are so changed; (c) a nucleic acid 

30 fragment of the sequence selected from the group consisting of SEQ ID NO: 2n-l, wherein n 
is an integer between I and 172; and (d) a nucleic acid fragment wherein one or more 
nucleotides in the nucleotide sequence selected from the group consisting of SEQ ID NO: 
2n-l, wherein n is an integer between 1 and 172 is changed from that selected from the 
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group consisting of the chosen sequence to a different nucleotide provided that no more than 
15% of the nucleotides are so changed. 

NOVX Nucleic Acids and Polypeptides 

One aspect of the invention pertains to isolated nucleic acid molecules that encode 
5 NOVX polypeptides or biologically active portions thereof. Also included in the invention 
are nucleic acid fragments sufficient for use as hybridization probes to identify 
NOVX-encoding nucleic acids (e.g., NOVX mRNAs) and fragments for use as PCR primers 
for the amplification and/or mutation of NOVX nucleic acid molecules. As used herein, the 
term "nucleic acid molecule" is intended to include DNA molecules (e.g., cDNA or genomic 
10 DNA), RNA molecules (e.g., mRNA), analogs of the DNA or RNA generated using 
nucleotide analogs, and derivatives, fragments and homologs thereof. The nucleic acid 
molecule may be single-stranded or double-stranded, but preferably is comprised 
double-stranded DNA. 



15 "mature 5 ' form of a polypeptide or protein disclosed in the present invention is the product of 
a naturally occurring polypeptide or precursor form or proprotein. The naturally occurring 
polypeptide, precursor or proprotein includes, by way of nonlimiting example, the full-length 
gene product encoded by the corresponding gene. Alternatively, it may be defined as the 
polypeptide, precursor or proprotein encoded by an ORF described herein. The product 

20 "mature 55 form arises, by way of nonlimiting example, as a result of one or more naturally 
occurring processing steps that may take place within the cell (e.g., host cell) in which the 
gene product arises. Examples of such processing steps leading to a "mature 55 form of a 
polypeptide or protein include the cleavage of the N-terminal methionine residue encoded by 
the initiation codon of an ORF, or the proteolytic cleavage of a signal peptide or leader 

25 sequence. Thus a mature form arising from a precursor polypeptide or protein that has 

residues 1 to N, where residue 1 is the N-terminal methionine, would have residues 2 through 
N remaining after removal of the N-terminal methionine. Alternatively, a mature form 
arising from a precursor polypeptide or protein having residues I to N, in which an 
N-terminal signal sequence from residue 1 to residue M is cleaved, would have the residues 

30 from residue M+l to residue N remaining. Further as used herein, a "mature 55 form of a 
polypeptide or protein may arise from a step of post-translational modification other than a 
proteolytic cleavage event. Such additional processes include, by way of non-limiting 



A NOVX nucleic acid can encode a mature NOVX polypeptide. As used herein, a 
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example, glycosylation, myristylation or phosphorylation. In general, a mature polypeptide 
or protein may result from the operation of only one of these processes, or a combination of 
any of them. 

The term "probe", as utilized herein, refers to nucleic acid sequences of variable 
5 length, preferably between at least about 10 nucleotides (nt), about 100 nt, or as many as 
approximately, e.g., 6,000 nt, depending upon the specific use. Probes are used in the 
detection of identical, similar, or complementary nucleic acid sequences. Longer length 
probes are generally obtained from a natural or recombinant source, are highly specific, and 
much slower to hybridize than shorter-length oligomer probes. Probes may be single- 

10 stranded or double-stranded and designed to have specificity in PCR, membrane-based 
hybridization technologies, or ELISA-Iike technologies. 

The term "isolated" nucleic acid molecule, as used herein, is a nucleic acid that is 
separated from other nucleic acid molecules which are present in the natural source of the 
nucleic acid. Preferably, an "isolated" nucleic acid is free of sequences which naturally flank 

15 the nucleic acid (i.e., sequences located at the 5'- and 3'-termini of the nucleic acid) in the 
genomic DNA of the organism from which the nucleic acid is derived. For example, in 
various embodiments, the isolated NOVX nucleic acid molecules can contain less than about 
5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb or 0.1 kb of nucleotide sequences which naturally flank 
the nucleic acid molecule in genomic DNA of the cell/tissue from which the nucleic acid is 

20 derived (e.g., brain, heart, liver, spleen, etc.). Moreover, an "isolated" nucleic acid molecule, 
such as a cDNA molecule, can be substantially free of other cellular material, or culture 
medium, or of chemical precursors or other chemicals. 

A nucleic acid molecule of the invention, e.g., a nucleic acid molecule having the 
nucleotide sequence of SEQ ID NO:2w- 1, wherein n is an integer between 1 and 172, or a 

25 complement of this nucleotide sequence, can be isolated using standard molecular biology 
techniques and the sequence information provided herein. Using all or a portion of the 
nucleic acid sequence of SEQ ID NO:2a7-1, wherein n is an integer between I and 172, as a 
hybridization probe, NOVX molecules can be isolated using standard hybridization and 
cloning techniques (e.g., as described in Sambrook, et ai, (eds.), MOLECULAR Cloning: A 

30 Laboratory Manual 2 nd Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 
NY, 1989; and Ausubel, et al, (eds.), Current Protocols in Molecular Biology, John 
Wiley & Sons, New York, NY, 1993.) 
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A nucleic acid of the invention can be amplified using cDNA, mRNA or alternatively, 
genomic DNA, as a template with appropriate oligonucleotide primers according to standard 
PCR amplification techniques. The nucleic acid so amplified can be cloned into an 
appropriate vector and characterized by DNA sequence analysis. Furthermore, 
5 oligonucleotides corresponding to NOVX nucleotide sequences can be prepared by standard 
synthetic techniques, e.g., using an automated DNA synthesizer. 

As used herein, the term "oligonucleotide" refers to a series of linked nucleotide 
residues. A short oligonucleotide sequence may be based on. or designed from, a genomic or 
cDNA sequence and is used to amplify, confirm, or reveal the presence of an identical, 

10 similar or complementary DNAorRNA in a particular cell or tissue. Oligonucleotides 
comprise a nucleic acid sequence having about 10 nt, 50 nt, or 100 nt in length, preferably 
about 1 5 nt to 30 nt in length. In one embodiment of the invention, an oligonucleotide 
comprising a nucleic acid molecule less than 100 nt in length would further comprise at least 
6 contiguous nucleotides of SEQ ID NO:2w-l, wherein n is an integer between 1 and 1^2, or 

15 a complement thereof. Oligonucleotides may be chemically synthesized and may also be 
used as probes. 

In another embodiment, an isolated nucleic acid molecule of the invention comprises 
a nucleic acid molecule that is a complement of the nucleotide sequence shown in SEQ ID 
NO:2>?-l, wherein n is an integer between 1 and 172, or a portion of this nucleotide sequence 

20 (e.g., a fragment that can be used as a probe or primer or a fragment encoding a 

biologically-active portion of a NOVX polypeptide). A nucleic acid molecule that is 
complementary to the nucleotide sequence of SEQ IDN0:2>7-1, wherein n is an integer 
between 1 and 172, is one that is sufficiently complementary to the nucleotide sequence of 
SEQ ID NO:2w-l, wherein n is an integer between 1 and 172, that it can hydrogen bond with 

25 few or no mismatches to the nucleotide sequence shown in SEQ ID NO:2w-l, wherein n is an 
integer between 1 and 1 72, thereby forming a stable duplex. 

As used herein, the term "complementary" refers to Watson-Crick or Hoogstecn base 
pairing between nucleotides units of a nucleic acid molecule, and the term "binding" means 
the physical or chemical interaction between two polypeptides or compounds or associated 

30 polypeptides or compounds or combinations thereof. Binding includes ionic, non-ionic, van 
der Waals, hydrophobic interactions, and the like. A physical interaction can be either direct 
or indirect. Indirect interactions may be through or due to the effects of another polypeptide 
or compound. Direct binding refers to interactions that do not take place through, or due to, 

20 
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the effect of another polypeptide or compound, but instead are without other substantial 
chemical intermediates. 

A "fragment" provided herein is defined as a sequence of at least 6 (contiguous) 
nucleic acids or at least 4 (contiguous) amino acids, a length sufficient to allow for specific 
5 hybridization in the case of nucleic acids or for specific recognition of an epitope in the case 
of amino acids, and is at most some portion less than a full length sequence. Fragments may 
be derived from any contiguous portion of a nucleic acid or amino acid sequence of choice. 

A full-length NOVX clone is identified as containing an ATG translation start codon 
and an in-frame stop codon. Any disclosed NOVX nucleotide sequence lacking an ATG 

1 0 start codon therefore encodes a truncated C-terminal fragment of the respective NOVX 

polypeptide, and requires that the corresponding full-length cDNA extend in the 5 ? direction 
of the disclosed sequence. Any disclosed NOVX nucleotide sequence lacking an in-frame 
stop codon similarly encodes a truncated N-terminal fragment of the respective NOVX 
polypeptide, and requires that the corresponding full-length cDNA extend in the 3 7 direction 

1 5 of the disclosed sequence. 

A "derivative" is a nucleic acid sequence or amino acid sequence formed from the 
native compounds either directly, by modification or partial substitution. An "analog'' is a 
nucleic acid sequence or amino acid sequence that has a structure similar to, but not identical 
to, the native compound, e.g. they differs from it in respect to certain components or side 

20 chains. Analogs may be synthetic or derived from a different evolutionary origin and may 
have a similar or opposite metabolic activity compared to wild type. A "homolog" is a 
nucleic acid sequence or amino acid sequence of a particular gene that is derived from 
different species. 

Derivatives and analogs may be full length or other than full length. Derivatives or 
25 analogs of the nucleic acids or proteins of the invention include, but are not limited to, 
molecules comprising regions that are substantially homologous to the nucleic acids or 
proteins of the invention, in various embodiments, by at least about 70%, 80%, or 95% 
identity (with a preferred identity of 80-95%) over a nucleic acid or amino acid sequence of 
identical size or when compared to an aligned sequence in which the alignment is done by a 
30 computer homology program known in the art, or whose encoding nucleic acid is capable of 
hybridizing to the complement of a sequence encoding the proteins under stringent, 
moderately stringent, or low stringent conditions. See e.g. Ausubel, et at., CURRENT 
PROTOCOLS IN MOLECULAR BiOLOGY, John Wiley & Sons, New York, NY, 1993, and below. 
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A "homologous nucleic acid sequence" or "homologous amino acid sequence/' or 
variations thereof, refer to sequences characterized by a homology at the nucleotide level or 
amino acid level as discussed above. Homologous nucleotide sequences include those 
sequences coding for isoforms of NOVX polypeptides. Isoforms can be expressed in 
5 different tissues of the same organism as a result of, for example, alternative splicing of 
RNA. Alternatively, isoforms can be encoded by different genes. In the invention, 
homologous nucleotide sequences include nucleotide sequences encoding for a NOVX 
polypeptide of species other than humans, including, but not limited to: vertebrates, and thus 
can include, e.g., frog, mouse, rat, rabbit, dog, cat cow, horse, and other organisms. 

10 Homologous nucleotide sequences also include, but are not limited to, naturally occurring 
allelic variations and mutations of the nucleotide sequences set forth herein. A homologous 
nucleotide sequence does not, however, include the exact nucleotide sequence encoding 
human NOVX protein. Homologous nucleic acid sequences include those nucleic acid 
sequences that encode conservative amino acid substitutions (see below) in SEQ ID NO:2>7-l, 

15 wherein n is an integer between I and 172, as well as a polypeptide possessing NOVX 

biological activity. Various biological activities of the NOVX proteins are described below. 

A NOVX polypeptide is encoded by the open reading frame ("ORF") of a NOVX 
nucleic acid. An ORF corresponds to a nucleotide sequence that could potentially be 
translated into a polypeptide. A stretch of nucleic acids comprising an ORF is uninterrupted 

20 by a stop codon. An ORF that represents the coding sequence for a full protein begins with 
an ATG "start" codon and terminates with one of the three "stop" codons, namely, TAA, 
TAG, or TGA. For the purposes of this invention, an ORF may be any part of a coding 
sequence, with or without a start codon, a stop codon, or both. For an ORF to be considered 
as a good candidate for coding for a bona fide cellular protein, a minimum size requirement is 

25 often set, e.g., a stretch of DNA that would encode a protein of 50 amino acids or more. 

The nucleotide sequences determined from the cloning of the human NOVX genes 
allows for the generation of probes and primers designed for use in identifying and/or cloning 
NOVX homologues in other cell types, e.g. from other tissues, as well as NOVX homologues 
from other vertebrates. The probe/primer typically comprises substantially purified 

30 oligonucleotide. The oligonucleotide typically comprises a region of nucleotide sequence 
that hybridizes under stringent conditions to at least about 12, 25, 50, 100, 150, 200, 250, 
300, 350 or 400 consecutive sense strand nucleotide sequence of SEQ ID NO:2w-l } wherein n 
is an integer between 1 and 172; or an anti-sense strand nucleotide sequence of SEQ ID 
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NO:2/7-l, wherein n is an integer between 1 and 172; or of a naturally occurring mutant of 
SEQ ID NO:2w-l, wherein n is an integer between 1 and 172. 

Probes based on the human NOVX nucleotide sequences can be used to detect 
transcripts or genomic sequences encoding the same or homologous proteins. In various 
5 embodiments, the probe has a detectable label attached, e.g. the label can be a radioisotope, a 
fluorescent compound, an enzyme, or an enzyme co-factor. Such probes can be used as a 
part of a diagnostic test kit for identifying cells or tissues which mis-express a NOVX 
protein, such as by measuring a level of a NOVX-encoding nucleic acid in a sample of cells 
from a subject e.g., detecting NOVX mRNA levels or determining whether a genomic NOVX 

1 0 gene has been mutated or deleted. 

"A polypeptide having a biologically-active portion of a NOVX polypeptide" refers to 
polypeptides exhibiting activity similar, but not necessarily identical to, an activity of a 
polypeptide of the invention, including mature forms, as measured in a particular biological 
assay, with or without dose dependency. A nucleic acid fragment encoding a 

1 5 "biologically-active portion of NOVX" can be prepared by isolating a portion of SEQ ID 
NO:2w-I, wherein n is an integer between 1 and 172, that encodes a polypeptide having a 
NOVX biological activity (the biological activities of the NOVX proteins are described 
below), expressing the encoded portion of NOVX protein {e.g., by recombinant expression hi 
vitro) and assessing the activity of the encoded portion of NOVX. 

20 NOVX Nucleic Acid and Polypeptide Variants 

The invention further encompasses nucleic acid molecules that differ from the 
nucleotide sequences of SEQ ID NO:2>?-1 3 wherein n is an integer between I and 172, due to 
degeneracy of the genetic code and thus encode the same NOVX proteins as that encoded by 
the nucleotide sequences of SEQ ID NO:2w- 1, wherein n is an integer between 1 and 172. In 

25 another embodiment, an isolated nucleic acid molecule of the invention has a nucleotide 

sequence encoding a protein having an amino acid sequence of SEQ ID NO:2/7, wherein n is 
an integer between I and 172. 

In addition to the human NOVX nucleotide sequences of SEQ ID NO:2/7-l, wherein n 
is an integer between 1 and 172, it will be appreciated by those skilled in the art that DNA 

30 sequence polymorphisms that lead to changes in the amino acid sequences of the NOVX 
polypeptides may exist within a population (e.g., the human population). Such genetic 
polymorphism in the NOVX genes may exist among individuals within a population due to 
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natural allelic variation. As used herein, the terms "gene" and "recombinant gene" refer to 
nucleic acid molecules comprising an open reading frame (ORF) encoding a NOVX protein, 
preferably a vertebrate NOVX protein. Such natural allelic variations can typically result in 
1-5% variance in the nucleotide sequence of the NOVX genes. Any and all such nucleotide 
5 variations and resulting amino acid polymorphisms in the NOVX polypeptides, which are the 
result of natural allelic variation and that do not alter the functional activity of the NOVX 
polypeptides, are intended to be within the scope of the invention. 

Moreover, nucleic acid molecules encoding NOVX proteins from other species, and 
thus that have a nucleotide sequence that differs from a human SEQ ID NO:2w-l, wherein n 
10 is an integer between 1 and 172, are intended to be within the scope of the invention. Nucleic 
acid molecules corresponding to natural allelic variants and homologues of the NOVX 
cDNAs of the invention can be isolated based on their homology to the human NOVX 
nucleic acids disclosed herein using the human cDNAs, or a portion thereof, as a 
hybridization probe according to standard hybridization techniques under stringent 

1 5 hybridization conditions. 

Accordingly, in another embodiment, an isolated nucleic acid molecule of the 
invention is at least 6 nucleotides in length and hybridizes under stringent conditions to the 
nucleic acid molecule comprising the nucleotide sequence of SEQ ID NO:2tt-l, wherein n is 
an integer between 1 and 172. In another embodiment, the nucleic acid is at least 10, 25, 50, 

20 100,250,500,750, 1000, 1500, or 2000 or more nucleotides in length. In yet another 
embodiment, an isolated nucleic acid molecule of the invention hybridizes to the coding 
region. As used herein, the term "hybridizes under stringent conditions" is intended to 
describe conditions for hybridization and washing under which nucleotide sequences at least 
about 65% homologous to each other typically remain hybridized to each other. 

25 Homologs (i.e., nucleic acids encoding NOVX proteins derived from species other 

than human) or other related sequences (e.g., paralogs) can be obtained by low, moderate or 
high stringency hybridization with all or a portion of the particular human sequence as a 
probe using methods well known in the art for nucleic acid hybridization and cloning. 

As used herein, the phrase "stringent hybridization conditions" refers to conditions 

30 under which a probe, primer or oligonucleotide will hybridize to its target sequence, but to no 
other sequences. Stringent conditions are sequence-dependent and will be different in 
different circumstances. Longer sequences hybridize specifically at higher temperatures than 
shorter sequences. Generally, stringent conditions are selected to be about 5 °C lower than the 
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thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. The 
Tm is the temperature (under defined ionic strength, pH and nucleic acid concentration) at ' 
which 50% of the probes complementary to the target sequence hybridize to the target 
sequence at equilibrium. Since the target sequences are generally present at excess, at Tm, 
5 50% of the probes are occupied at equilibrium. Typically, stringent conditions will be those 
in which the salt concentration is less than about 1.0 M sodium ion, typically about 0.01 to 
1 .0 M sodium ion (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30 °C 
for short probes, primers or oligonucleotides (e.g., 1 0 nt to 50 nt) and at least about 60 °C for 
longer probes, primers and oligonucleotides. Stringent conditions may also be achieved with 

1 0 the addition of destabilizing agents, such as formamide. 

Stringent conditions are known to those skilled in the art and can be found in Ausubel, 
et al., (eds.), CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, John Wiley & Sons, N.Y, 
(1989), 6.3.1-6.3.6. Preferably, the conditions are such that sequences at least about 65%, 
70%, 75%, 85%, 90%, 95%, 98%, or 99% homologous to each other typically remain 

1 5 hybridized to each other. A non-limiting example of stringent hybridization conditions are 
hybridization in a high salt buffer comprising 6X SSC, 50 mM Tris-HC! (pH 7.5), 1 mM 
EDTA, 0.02% PVP, 0.02% Ficoll, 0.02% BSA, and 500 mg/ml denatured salmon sperm 
DNA at 65°C, followed by one or more washes in 0.2X SSC, 0.01% BSA at 50°C. An 
isolated nucleic acid molecule of the invention that hybridizes under stringent conditions to a 

20 sequence of SEQ ID NO:2w- 1 , wherein n is an integer between 1 and 1 72, corresponds to a 
naturally-occurring nucleic acid molecule. As used herein, a "naturally-occurring" nucleic 
acid molecule refers to an RNA or DNA molecule having a nucleotide sequence that occurs 
in nature (e.g., encodes a natural protein). 

In a second embodiment, a nucleic acid sequence that is hybridizable to the nucleic 

25 acid molecule comprising the nucleotide sequence of SEQ ID N0:2>7-1, wherein n is an 

integer between I and !72, or fragments, analogs or derivatives thereof, under conditions of 
moderate stringency is provided. A non-limiting example of moderate stringency 
hybridization conditions are hybridization in 6X SSC, 5X Reinhardt's solution, 0.5% SDS 
and 100 mg/ml denatured salmon sperm DNA at 55 °C, followed by one or more washes in 

30 IX SSC, 0.1% SDS at 37 °C. Other conditions of moderate stringency that may be used are 
well-known within the art. See, e.g., Ausubel, et al. (eds.), 1993, CURRENT PROTOCOLS in 
Molecular Biology, John Wiley & Sons, NY, and Krieger, 1990; GENE Transfer and 
Expression, A Laboratory Manual, Stockton Press, NY. 
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In a third embodiment, a nucleic acid that is hybridizable to the nucleic acid molecule 
comprising the nucleotide sequences of SEQ ID NO:2/?-l, wherein n is an integer between I 
and 1 72, or fragments, analogs or derivatives thereof, under conditions of low stringency, is 
provided. A non-limiting example of low stringency hybridization conditions are 
5 hybridization in 35% formamide, 5X SSC, 50 mM Tris-HCI (pH 7.5), 5 mM EDTA, 0.02% 
PVP, 0.02% Ficoll, 0.2% BSA, 100 mg/ml denatured salmon sperm DNA, 10% (wt/vol) 
dextran sulfate at 40°C, followed by one or more washes in 2X SSC, 25 mM Tris-HCI (pH 
7.4), 5 mM EDTA, and 0.1% SDS at 50°C. Other conditions of low stringency that may be 
used are well known in the art {e.g., as employed for cross-species hybridizations). See, e.g., 
1 0 Ausubel, ei al. (eds.), 1 993, CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, John Wiley & 
Sons, NY, and Kriegler, 1990, Gene Transfer and Expression, A Laboratory Manual, 
Stockton Press, NY; Shilo and Weinberg, 1981 . Proc Natl Acad Sci USA 78: 6789-6792. 

Conservative Mutations 

In addition to naturally-occurring allelic variants of NOVX sequences that may exist 

1 5 in the population, the skilled artisan will further appreciate that changes can be introduced by 
mutation into the nucleotide sequences of SEQ ID NO:2rc-l, wherein n is an integer between 
1 and 1 72, thereby leading to changes in the amino acid sequences of the encoded NOVX 
protein, without altering the functional ability of that NOVX protein. For example, 
nucleotide substitutions leading to amino acid substitutions at "non-essential" amino acid 

20 residues can be made in the sequence of SEQ ID NO:2/?, wherein n is an integer between 1 
and 172. A "non-essential" amino acid residue is a residue that can be altered from the 
wild-type sequences of the NOVX proteins without altering their biological activity, whereas 
an "essential" amino acid residue is required for such biological activity. For example, amino 
acid residues that are conserved among the NOVX proteins of the invention are predicted to 

25 be particularly non-amenable to alteration. Amino acids for which conservative substitutions 
can be made are well-known within the art. 

Another aspect of the invention pertains to nucleic acid molecules encoding NOVX 
proteins that contain changes in amino acid residues that are not essential for activity. Such 
NOVX proteins differ in amino acid sequence from SEQ ID NO:2«-l, wherein n is an integer 

30 between 1 and 1 72, yet retain biological activity. In one embodiment, the isolated nucleic 
acid molecule comprises a nucleotide sequence encoding a protein, wherein the protein 
comprises an amino acid sequence at least about 40% homologous to the amino acid 
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sequences of SEQ ID NO:2tt, wherein n is an integer between I and 172. Preferably, the 
protein encoded by the nucleic acid molecule is at least about 60% homologous to SEQ ID 
NO:2n, wherein n is an integer between I and 1 72; more preferably at least about 70% 
homologous to SEQ ID NO:2/7, wherein n is an integer between 1 and 172; still more 
5 preferably at least about 80% homologous to SEQ ID NO:2n, wherein n is an integer between 
1 and 172; even more preferably at least about 90% homologous to SEQ ID NO:2/7 5 wherein 
n is an integer between 1 and 172; and most preferably at least about 95% homologous to 
SEQ ID NO:2w, wherein n is an integer between I and 1 72. 

An isolated nucleic acid molecule encoding a NOVX protein homologous to the 

1 0 protein of SEQ ID NO:2w, wherein n is an integer between I and 1 72, can be created by 
introducing one or more nucleotide substitutions, additions or deletions into the nucleotide 
sequence of SEQ ID NO:2fl-l, wherein n is an integer between I and 172, such that one or 
more amino acid substitutions, additions or deletions are introduced into the encoded protein. 
Mutations can be introduced any one of SEQ ID NO:2tf-l, wherein n is an integer 

15 between 1 and 172, by standard techniques, such as site-directed mutagenesis and 

PCR-mediated mutagenesis. Preferably, conservative amino acid substitutions are made at 
one or more predicted, non-essential amino acid residues. A "conservative amino acid 
substitution" is one in which the amino acid residue is replaced with an amino acid residue 
having a similar side chain. Families of amino acid residues having similar side chains have 

20 been defined within the art. These families include amino acids with basic side chains (e.g., 
lysine, arginine, histidine), acidic side chains (e.g., aspartic acid, glutamic acid), uncharged 
polar side chains (e.g., glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine), 
nonpolar side chains (e.g., alanine, valine, leucine, isoleucine, proline, phenylalanine, 
methionine, tryptophan), beta-branched side chains (e.g., threonine, valine, isoleucine) and 

25 aromatic side chains (e.g., tyrosine, phenylalanine, tryptophan, histidine). Thus, a predicted 
non-essential amino acid residue in the NOVX protein is replaced with another amino acid 
residue from the same side chain family. Alternatively, in another embodiment, mutations 
can be introduced randomly along all or part of a NOVX coding sequence, such as by 
saturation mutagenesis, and the resultant mutants can be screened for NOVX biological 

30 activity to identify mutants that retain activity. Following mutagenesis of a nucleic acid of 
SEQ ID NO:2/7- 1, wherein n is an integer between 1 and 172, the encoded protein can be 
expressed by any recombinant technology known in the art and the activity of the protein can 
be determined. 
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The relatedness of amino acid families may also be determined based on side chain 
interactions. Substituted amino acids may be fully conserved "strong" residues or fully 
conserved "weak" residues. The "strong" group of conserved amino acid residues may be any 
one of the following groups: STA, NEQK, NHQK, NDEQ, QHRK, M1LV ; MILF. HY. 
5 FYW 5 wherein the single letter amino acid codes are grouped by those amino acids that may 
be substituted for each other. Likewise, the "weak" group of conserved residues may be any 
one of the following: CSA, ATV, SAG, STNK, STPA, SGND, SNDEQK, NDEQHK ? 
NEQHRK, HFY, wherein the letters within each group represent the single letter amino acid 
code. 

10 In one embodiment, a mutant NOVX protein can be assayed for (0 the ability to form 

proteinrprotein interactions with other NOVX proteins, other cell-surface proteins, or 
biologically-active portions thereof, (it) complex formation between a mutant NOVX protein 
and a NOVX ligand; or (Hi) the ability of a mutant NOVX protein to bind to an intracellular 
target protein or biologically-active portion thereof; (e.g. avidin proteins). 

15 In yet another embodiment, a mutant NOVX protein can be assayed for the ability to 

regulate a specific biological function (e.g., regulation of insulin release). 

Interfering RNA 

In one aspect of the invention, NOVX gene expression can be attenuated by RNA 
interference. One approach well-known in the art is short interfering RNA (siRNA) mediated 

20 gene silencing where expression products of a NOVX gene are targeted by specific double 
stranded NOVX derived siRNA nucleotide sequences that are complementary to at least a 19- 
25 nt long segment of the NOVX gene transcript, including the 5' untranslated (UT) region, 
the ORF, or the 3 ? UT region. See, e.g., PCT applications WO00/44895, W099/32619, 
WOOI/75164, WO0I/925I3, WO 01/29058, WO01/89304, WO02/16620, and WO02/29858, 

25 each incorporated by reference herein in their entirety. Targeted genes can be a NOVX gene, 
or an upstream or downstream modulator of the NOVX gene. Nonlimiting examples of 
upstream or downstream modulators of a NOVX gene include, e.g., a transcription factor 
that binds the NOVX gene promoter, a kinase or phosphatase that interacts with a NOVX 
polypeptide, and polypeptides involved in a NOVX regulatory pathway. 

30 According to the methods of the present invention, NOVX gene expression is silenced 

using short interfering RNA. A NOVX polynucleotide according to the invention includes a 
siRNA polynucleotide. Such a NOVX siRNA can be obtained using a NOVX polynucleotide 
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sequence, for example, by processing the NOVX ribopolynucleotide sequence in a cell-free 
system, such as but not limited to a Drosophila extract, or by transcription of recombinant 
double stranded NOVX RNA or by chemical synthesis of nucleotide sequences homologous 
to a NOVX sequence. See, e.g., Ttischl, Zamore, Lehmann, Bartel and Sharp ( 1 999), Genes 
5 & Dev. 13: 3191-3197, incorporated herein by reference in its entirety. When synthesized, a 
typical 0.2 micromolar-scale RNA synthesis provides about 1 milligram of siRNA, which is 
sufficient for 1000 transfection experiments using a 24-well tissue culture plate format. 

The most efficient silencing is generally observed with siRNA duplexes composed of 
a 21-nt sense strand and a 21-nt antisense strand, paired in a manner to have a 2-nt 

1 0 3' overhang. The sequence of the 2-nt 3' overhang makes an additional small contribution to 
the specificity of siRNA target recognition. The contribution to specificity is localized* to the 
unpaired nucleotide adjacent to the first paired bases. In one embodiment, the nucleotides in 
the 3' overhang are ribonucleotides. In an alternative embodiment, the nucleotides in the 3' 
overhang are deoxyribonucleotides. Using 2-deoxyribonucleotides in the 3' overhangs is as 

1 5 efficient as using ribonucleotides, but deoxyribonucleotides are often cheaper to synthesize 
and are most likely more nuclease resistant. 

A contemplated recombinant expression vector of the invention comprises a NOVX 
DNA molecule cloned into an expression vector comprising operatively-linked regulatory 
sequences flanking the NOVX sequence in a manner that allows for expression (by 

20 transcription of the DNA molecule) of both strands. An RNA molecule that is antisense to 
NOVX mRNA is transcribed by a first promoter (e.g., a promoter sequence 3' of the cloned 
DNA) and an RNA molecule that is the sense strand for the NOVX mRNA is transcribed by 
a second promoter (e.g., a promoter sequence 5 ? of the cloned DNA). The sense and 
antisense strands may hybridize in vivo to generate siRNA constructs for silencing of the 

25 NOVX gene. Alternatively, two constructs can be utilized to create the sense and anti-sense 
strands of a siRNA construct. Finally, cloned DNA can encode a construct having secondary 
structure, wherein a single transcript has both the sense and complementary antisense 
sequences from the target gene or genes. In an example of this embodiment, a hairpin RNAi 
product is homologous to all or a portion of the target gene. In another example, a hairpin 

30 RNAi product is a siRNA. The regulatory sequences flanking the NOVX sequence may be 
identical or may be different, such that their expression may be modulated independently, or 
in a temporal or spatial manner. 
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In a specific embodiment, siRNAs are transcribed intracellularly by cloning the 
NOVX gene templates into a vector containing, e.g., a RNA pol III transcription unit from 
the smaller nuclear RNA (snRNA) U6or the human RNase P RNA HI. One example of a 
vector system is the GeneSuppressor™ RNA Interference kit (commercially available from 
5 Imgenex). The U6 and HI promoters are members of the type III class of Pol III promoters. 
The +1 nucleotide of the U6-Iike promoters is always guanosine, whereas the +1 for H I 
promoters is adenosine. The termination signal for these promoters is defined by five 
consecutive thymidines. The transcript is typically cleaved after the second uridine. Cleavage 
at this position generates a 3' UU overhang in the expressed siRNA, which is similar to the 3' 

10 overhangs of synthetic siRNAs. Any sequence less than 400 nucleotides in length can be 
transcribed by these promoter, therefore they are ideally suited for the expression of around 
21 -nucleotide siRNAs in, e.g., an approximately 50-nucleotide RNA stem-loop transcript. 

A siRNA vector appears to have an advantage over synthetic siRNAs where long term 
knock-down of expression is desired. Cells transfected with a siRNA expression vector 

15 would experience steady, long-term mRNA inhibition. In contrast, cells transfected with 

exogenous synthetic siRNAs typically recover from mRNA suppression within seven days or 
ten rounds of cell division. The long-term gene silencing ability of siRNA expression vectors 
may provide for applications in gene therapy. 

In general, siRNAs are chopped from longer dsRNA by an ATP-dependent 

20 ribonuclease called DICER. DICER is a member of the RNase III family of double-stranded 
RNA-specific endonucleases. The siRNAs assemble with cellular proteins into an 
endonuclease complex. In vitro studies in Drosophila suggest that the siRNAs/protein 
complex (siRNP) is then transferred to a second enzyme complex, called an RNA-induced 
silencing complex (RISC), which contains an endoribonuclease that is distinct from DICER. 

25 RISC uses the sequence encoded by the antisense siRNA strand to find and destroy mRNAs 
of complementary sequence. The siRNA thus acts as a guide, restricting the ribonuclease to 
cleave only mRNAs complementary to one of the two siRNA strands. 

A NOVX mRNA region to be targeted by siRNA is generally selected from a desired 
NOVX sequence beginning 50 tolOO nt downstream of the start codon. Alternatively, 5' or 3' 

30 UTRs and regions nearby the start codon can be used but are generally avoided, as these may 
be richer in regulatory protein binding sites. UTR-binding proteins and/or translation 
initiation complexes may interfere with binding of the siRNP or RISC endonuclease 
complex. An initial BLAST homology search for the selected siRNA sequence is done 
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against an available nucleotide sequence library to ensure that only one gene is targeted. 
Specificity of target recognition by siRNA duplexes indicate that a single point mutation 
located in the paired region of an siRNA duplex is sufficient to abolish target mRNA 
degradation. See, Elbashir el al. 2001 EMBO J. 20(23):6877-88. Hence, consideration 
5 should be taken to accommodate SNPs, polymorphisms, allelic variants or species-specific 
variations when targeting a desired gene. 

In one embodiment, a complete NOVX siRNA experiment includes the proper 
negative control. A negative control siRNA generally has the same nucleotide composition 
as the NOVX siRNA but lack significant sequence homology to the genome. Typically, one 

10 would scramble the nucleotide sequence of the NOVX siRNA and do a homology search to 
make sure it lacks homology to any other gene. 

Two independent NOVX siRNA duplexes can be used to knock-down a target NOVX 
gene. This helps to control for specificity of the silencing effect. In addition, expression of 
two independent genes can be simultaneously knocked down by using equal concentrations 

1 5 of different NOVX siRNA duplexes, e.g., a NOVX siRNA and an siRNA for a regulator of a 
NOVX gene or polypeptide. Availability of siRNA-associating proteins is believed to be 
more limiting than target mRNA accessibility. 

A targeted NOVX region is typically a sequence of two adenines (AA) and two 
thymidines (TT) divided by a spacer region of nineteen (N19) residues (e.g., AA(N19)TT). 

20 A desirable spacer region has a G/C-content of approximately 30% to 70%, and more 

preferably of about 50%. If the sequence AA(N 1 9)TT is not present in the target sequence, 
an alternative target region would be AA(N21). The sequence of the NOVX sense siRNA 
corresponds to (N 1 9)TT or N2 1 , respectively. In the latter case, conversion of the 3' end of 
the sense siRNA to TT can be performed if such a sequence does not naturally occur in the 

25 NOVX polynucleotide. The rationale for this sequence conversion is to generate a symmetric 
duplex with respect to the sequence composition of the sense and antisense 3' overhangs. 
Symmetric 3' overhangs may help to ensure that the siRNPs are formed with approximately 
equal ratios of sense and antisense target RNA-cleaving siRNPs. See, e.g., Elbashir, 
Lendeckel and Tuschl (2001). Genes & Dev. 15: 188-200, incorporated by reference herein in 

30 its entirely. The modification of the overhang of the sense sequence of the siRNA duplex is 
not expected to affect targeted mRNA recognition, as the antisense siRNA strand guides 
target recognition. 
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Alternatively, if the NOVX target mRN A does not contain a suitable A A(N2 1 ) 
sequence, one may search for the sequence N A(N2 1 ). Further, the sequence of the sense 
strand and antisense strand may still be synthesized as 5' (NI9)TT, as it is believed that the 
sequence of the 3'-most nucleotide of the antisense siRNA does not contribute to specificity. 
5 Unlike antisense or ribozyme technology, the secondary structure of the target mRNA does 
not appear to have a strong effect on silencing. See, Harborth, et al. (2001) J. Cell Science 
1 14: 4557-4565, incorporated by reference in its entirety. 

Transfection of NOVX siRNA duplexes can be achieved using standard nucleic acid 
transfection methods, for example, OLIGOFECTAMINE Reagent (commercially available 

10 from Invitrogen). An assay for NOVX gene silencing is generally performed approximately 
2 days after transfection. No NOVX gene silencing has been observed in the absence of 
transfection reagent, allowing for a comparative analysis of the wild-type and silenced 
NOVX phenotypes. In a specific embodiment, for one well of a 24-weII plate, approximately 
0.84 |ig of the siRNA duplex is generally sufficient. Cells are typically seeded the previous 

15 day, and are transfected at about 50% confluence. The choice of cell culture media and 
conditions are routine to those of skill in the art, and will vary with the choice of cell type. 
The efficiency of transfection may depend on the cell type, but also on the passage number 
and the confluency of the cells. The time and the manner of formation of siRNA-liposome 
complexes (e.g. inversion versus vortexing) are also critical. Low transfection efficiencies 

20 are the most frequent cause of unsuccessful NOVX silencing. The efficiency of transfection 
needs to be carefully examined for each new cell line to be used. Preferred cell are derived 
from a mammal, more preferably from a rodent such as a rat or mouse, and most preferably 
from a human. Where used for therapeutic treatment, the cells are preferentially autologous, 
although non-autologous cell sources are also contemplated as within the scope of the present 

25 invention. 

For a control experiment, transfection of 0.84 jig single-stranded sense NOVX siRNA 
will have no effect on NOVX silencing, and 0.84 \xg antisense siRNA has a weak silencing 
effect when compared to 0.84 ng of duplex siRNAs. Control experiments again allow for a 
comparative analysis of the wild-type and silenced NOVX phenotypes. To control for 
30 transfection efficiency, targeting of common proteins is typically performed, for example 
targeting of lamin A/C or transfection of a CM V-driven EGFP-expression plasmid (e.g. 
commercially available from Clontech). In the above example, a determination of the 
fraction of lamin A/C knockdown in cells is determined the next day by such techniques as 
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immunofluorescence, Western blot, Northern blot or other similar assays for protein 
expression or gene expression. Lamin A/C monoclonal antibodies may be obtained from 
Santa Cruz Biotechnology. 

Depending on the abundance and the half life (or turnover) of the targeted NOVX 
5 polynucleotide in a cell, a knock-down phenotype may become apparent after I to 3 days, or 
even later. In cases where no NOVX knock-down phenotype is observed, depletion of the 
NOVX polynucleotide may be observed by immunofluorescence or Western blotting. If the 
NOVX polynucleotide is still abundant after 3 days, cells need to be split and transferred to a 
fresh 24-well plate for re-transfection. If no knock-down of the targeted protein is observed, 

10 it may be desirable to analyze whether the target mRNA (NOVX or a NOVX upstream or 
downstream gene) was effectively destroyed by the transfected siRNA duplex. Two days 
after transfection, total RNA is prepared, reverse transcribed using a target-specific primer, 
and PCR-amplified with a primer pair covering at least one exon-exon junction in order to 
control for amplification of pre-mRNAs. RT/PCR of a non-targeted mRNA is also needed as 

15 control. Effective depletion of the mRNA yet undetectable reduction of target protein may 
indicate that a large reservoir of stable NOVX protein may exist in the cell. Multiple 
transfection in sufficiently long intervals may be necessary until the target protein is finally 
depleted to a point where a phenotype may become apparent. If multiple transfection steps 
are required, cells are split 2 to 3 days after transfection. The cells may be transfected 

20 immediately after splitting. 

An inventive therapeutic method of the invention contemplates administering a 
NOVX siRNA construct as therapy to compensate for increased or aberrant NOVX 
expression or activity. The NOVX ribopolynucleotide is obtained and processed into siRNA 
fragments, or a NOVX siRNA is synthesized, as described above. The NOVX siRNA is 

25 administered to cells or tissues using known nucleic acid transfection techniques, as 

described above. A NOVX siRNA specific for a NOVX gene will decrease or knockdown 
NOVX transcription products, which will lead to reduced NOVX polypeptide production, 
resulting in reduced NOVX polypeptide activity in the cells or tissues. 

The present invention also encompasses a method of treating a disease or condition 

30 associated with the presence of a NOVX protein in an individual comprising administering to 
the individual an RNAi construct that targets the mRNA of the protein (the mRNA that 
encodes the protein) for degradation. A specific RNAi construct includes a siRNA or a 
double stranded gene transcript that is processed into siRNAs. Upon treatment, the target 
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protein is not produced or is not produced to the extent it would be in the absence of the 
treatment. 

Where the NOVX gene function is not correlated with a known phenotype, a control 
sample of cells or tissues from healthy individuals provides a reference standard for 
5 determining NOVX expression levels. Expression levels are detected using the assays 

described, e.g., RT-PCR, Northern blotting, Western blotting, ELISA, and the like. A subject 
sample of cells or tissues is taken from a mammal, preferably a human subject, suffering 
from a disease state. The NOVX ribopolynucleotide is used to produce siRNA constructs, 
that are specific for the NOVX gene product. These cells or tissues are treated by 
1 0 administering NOVX siRNA's to the cells or tissues by methods described for the 

transfection of nucleic acids into a cell or tissue, and a change in NOVX polypeptide or 
polynucleotide expression is observed in the subject sample relative to the control sample, 
using the assays described. This NOVX gene knockdown approach provides a rapid method 
for determination of a NOVX minus (NOVX ) phenotype in the treated subject sample. The 
1 5 NOVX" phenotype observed in the treated subject sample thus serves as a marker for 
monitoring the course of a disease state during treatment. 

In specific embodiments, a NOVX siRN A is used in therapy. Methods for the 
generation and use of a NOVX siRNA are known to those skilled in the art. Example 
techniques are provided below. 

20 .Production of RN As 

Sense RNA (ssRNA) and antisense RNA (asRNA) of NOVX are produced using 
known methods such as transcription in RNA expression vectors. In the initial experiments, 
the sense and antisense RNA are about 500 bases in length each. The produced ssRNA and 
asRNA (0.5 |iM) in 10 mM Tris-HCI (pH 7.5) with 20 mM NaCl were heated to 95° C for 1 
25 min then cooled and annealed at room temperature for 12 to 16 h. The RNAs are precipitated 
and resuspended in lysis buffer (below). To monitor annealing, RNAs are electrophoresed in 
a 2% agarose gel in TBE buffer and stained with ethidium bromide. See, e.g., Sambrook et 
al., Molecular Cloning. Cold Spring Harbor Laboratory Press, Plainview, N.Y. (1989). 

Lysate Preparation 

30 Untreated rabbit reticulocyte lysate (Ambion) are assembled according to the 

manufacturer's directions. dsRNA is incubated in the lysate at 30° C for 10 min prior to the 
addition of mRNAs. Then NOVX mRNAs are added and the incubation continued for an 
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additional 60 min. The molar ratio of double stranded RNA and mRNA is about 200: 1 . The 
NOVX mRNA is radiolabeled (using known techniques) and its stability is monitored by gel 
electrophoresis. 

In a parallel experiment made with the same conditions, the double stranded RNA is 
5 internally radiolabeled with a 32 P-ATP. Reactions are stopped by the addition of 2 X 
proteinase K buffer and deproteinized as described previously (Tuschl et aL, Genes Dev., 
13:3191-3197 (1999)). Products are analyzed by electrophoresis in 15% or 18% 
polyacrylamide sequencing gels using appropriate RNA standards. By monitoring the gels 
for radioactivity, the natural production of 10 to 25 nt RNAs from the double stranded RNA 
10 can be determined. 

The band of double stranded RNA, about 21-23 bps, is eluded. The efficacy of these 
21-23 mers for suppressing NOVX transcription is assayed in vitro using the same rabbit 
reticulocyte assay described above using 50 nanomolar of double stranded 2 1 -23 mer for 
each assay. The sequence of these 21-23 mers is then determined using standard nucleic acid 
1 5 sequencing techniques. 

RNA Preparation 

21 nt RNAs, based on the sequence determined above, are chemically synthesized 
using Expedite RNA phosphoramidites and thymidine phosphoramidite (Proligo, Germany). 
Synthetic oligonucleotides are deprotected and gel-purified (Elbashir, Lendeckel, & Tuschl, 
20 Genes & Dev. 15, 188-200 (2001)), followed by Sep-Pak CI 8 cartridge (Waters, Milford, 
Mass., USA) purification (Tuschl, et al., Biochemistry, 32:1 1658-1 1668 (1993)). 

These RNAs (20 jaM) single strands are incubated in annealing buffer (100 mM 
potassium acetate, 30 mM HEPES-KOH at pH 7.4, 2 mM magnesium acetate) for 1 min at 
90° C followed by I hat37°C. 

25 Cell Culture 

A cell culture known in the art to regularly express NOVX is propagated using 
standard conditions. 24 hours before transfection, at approx. 80% confluency, the cells are 
trypsinized and diluted 1:5 with fresh medium without antibiotics (1-3 X 105 cells/ml) and 
transferred to 24-well plates (500 ml/well). Transfection is performed using a commercially 
30 available lipofection kit and NOVX expression is monitored using standard techniques with 
positive and negative control. A positive control is cells that naturally express NOVX while 
a negative control is cells that do not express NOVX. Base-paired 21 and 22 nt siRNAs with 
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overhanging 3* ends mediate efficient sequence-specific mRNA degradation in lysates and in 
cell culture. Different concentrations of siRNAs are used. An efficient concentration for 
suppression in vitro in mammalian culture is between 25 nM to 100 nM final concentration. 
This indicates that siRNAs are effective at concentrations that are several orders of 
5 magnitude below the concentrations applied in conventional antisense or ribozyme gene 
targeting experiments. 

The above method provides a way both for the deduction of NOVX siRNA sequence 
and the use of such siRNA for in vitro suppression. In vivo suppression may be performed 
using the same siRNA using well known in vivo transfection or gene therapy transfection 

10 techniques. 

Antisense Nucleic Acids 

Another aspect of the invention pertains to isolated antisense nucleic acid molecules 
that are hybridizable to or complementary to the nucleic acid molecule comprising the 
nucleotide sequence of SEQ ID NO:2/7-l, wherein n is an integer between 1 and 172, or 

15 fragments, analogs or derivatives thereof. An "antisense" nucleic acid comprises a nucleotide 
sequence that is complementary to a "sense" nucleic acid encoding a protein (e.g., 
complementary to the coding strand of a double-stranded cDNA molecule or complementary 
to an mRNA sequence). In specific aspects, antisense nucleic acid molecules are provided 
that comprise a sequence complementary to at least about 10, 25, 50, 100, 250 or 500 

20 nucleotides or an entire NOVX coding strand, or to only a portion thereof. Nucleic acid 
molecules encoding fragments, homologs, derivatives and analogs of a NOVX protein of 
SEQ ID NO:2/7, wherein n is an integer between 1 and 172, or antisense nucleic acids 
complementary to a NOVX nucleic acid sequence of SEQ ID NO:2w-l, wherein n is an 
integer between 1 and 172, are additionally provided. 

25 In one embodiment, an antisense nucleic acid molecule is antisense to a "coding 

region" of the coding strand of a nucleotide sequence encoding a NOVX protein. The term 
"coding region" refers to the region of the nucleotide sequence comprising codons which are 
translated into amino acid residues. In another embodiment, the antisense nucleic acid 
molecule is antisense to a "noncoding region" of the coding strand of a nucleotide sequence 

30 encoding the NOVX protein. The term "noncoding region" refers to 5' and 3* sequences 

which flank the coding region that are not translated into amino acids (i.e., also referred to as 
5' and 3' untranslated regions). 
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Given the coding strand sequences encoding the NOVX protein disclosed herein, 
antisense nucleic acids of the invention can be designed according to the rules of Waison arid 
Crick or Hoogsteen base pairing. The antisense nucleic acid molecule can be complementary 
to the entire coding region of NOVX mRNA, but more preferably is an oligonucleotide that is 
5 antisense to only a portion of the coding or noncoding region of NOVX mRNA. For 

example, the antisense oligonucleotide can be complementary to the region surrounding the 
translation start site of NOVX mRNA. An antisense oligonucleotide can be, for example, 
about 5, 10, 15, 20, 25, 30, 35, 40, 45 or 50 nucleotides in length. An antisense nucleic acid 
of the invention can be constructed using chemical synthesis or enzymatic ligation reactions 

10 using procedures known in the art. For example, an antisense nucleic acid (e.g., an antisense 
oligonucleotide) can be chemically synthesized using naturally-occurring nucleotides or 
variously modified nucleotides designed to increase the biological stability of the molecules 
or to increase the physical stability of the duplex formed between the antisense and sense 
nucleic acids (e.g., phosphorothioate derivatives and acridine substituted nucleotides can be 

15 used). 

Examples of modified nucleotides that can be used to generate the antisense nucleic 
acid include: 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, 
xanthine, 4-acetyIcytosine, 5-carboxymethyIaminomethyl-2-thiouridine, 
5-(carboxyhydroxyImethyI) uracil, 5-carboxymethylaminomethyluracil, dihydrouracil, 

20 beta-D-galactosylqueosine, inosine, N6-isopentenyladenine, 1-methylguanine, 

l-methylinosine, 2,2-dimethylguanine, 2-methyladenine, 2-methyIguanine, 5-methoxyuraciL 
3-methyIcytosine, 5-methylcytosine, N6-adenine, 7-methylguanine, 
5-methyIaminomethyluracil, 5-methoxyaminomethyl-2-thiouraci!, 2-thiouraciI, 4-thiouracil, 
beta-D-mannosylqueosine, S'-methoxycarboxymethyluracil, 

25 2-methylthio-N6-isopentenyladenine, uracil-5-oxyacetic acid (v), wybutoxosine, 
pseudouracil, queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 5-methyluracil, 
uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid (v), 5-methyl-2-thiouracil, 
3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3)w, and 2,6-diaminopurine. Alternatively, the 
antisense nucleic acid can be produced biologically using an expression vector into which a 

30 nucleic acid has been subcloned in an antisense orientation (/.e., RNA transcribed from the 
inserted nucleic acid will be of an antisense orientation to a target nucleic acid of interest, 
described further in the following subsection). 
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The antisense nucleic acid molecules of the invention are typically administered to a 
subject or generated in silu such that they hybridize with or bind to cellular mRNA and/or 
genomic DNA encoding a NOVX protein to thereby inhibit expression of the protein (e.g., by * 
inhibiting transcription and/or translation). The hybridization can be by conventional 
5 nucleotide complementarity to form a stable duplex, or, for example, in the case of an 

antisense nucleic acid molecule that binds to DNA duplexes, through specific interactions in 
the major groove of the double helix. An example of a route of administration of antisense 
nucleic acid molecules of the invention includes direct injection at a tissue site. 
Alternatively, antisense nucleic acid molecules can be modified to target selected cells and 

10 then administered systemically. For example, for systemic administration, antisense 

molecules can be modified such that they specifically bind to receptors or antigens expressed 
on a selected cell surface (e.g., by linking the antisense nucleic acid molecules to peptides or 
antibodies that bind to cell surface receptors or antigens). The antisense nucleic acid 
molecules can also be delivered to cells using the vectors described herein. To achieve 

1 5 sufficient nucleic acid molecules, vector constructs in which the antisense nucleic acid 
molecule is placed under the control of a strong pol II or pol III promoter are preferred. 

In yet another embodiment, the antisense nucleic acid molecule of the invention is an 
a-anomeric nucleic acid molecule. An cc-anomeric nucleic acid molecule forms specific 
double-stranded hybrids with complementary RNA in which, contrary to the usual p-units, 

20 the strands run parallel to each other. See, e.g., Gaultier, et a/., 1987. Nucl Acids Res. 15: 
6625-6641. The antisense nucleic acid molecule can also comprise a 
2*-o-methylribonucleotide (See, e.g., Inoue, etal 1987. Nucl. Acids Res. 15: 613 1-6148) or a 
chimeric RNA-DNA analogue (See, e.g., Inoue, et al. 9 1987. FEBS Lett. 215: 327-330. 

Ribozymes and PNA Moieties 

25 Nucleic acid modifications include, by way of non-limiting example, modified bases, 

and nucleic acids whose sugar phosphate backbones are modified or derivatized. These 
modifications are carried out at least in part to enhance the chemical stability of the modified 
nucleic acid, such that they may be used, for example, as antisense binding nucleic acids in 
therapeutic applications in a subject. 

30 In one embodiment, an antisense nucleic acid of the invention is a ribozyme. 

Ribozymes are catalytic RNA molecules with ribonuclease activity that are capable of 
cleaving a single-stranded nucleic acid, such as an mRNA, to which they have a 
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complementary region. Thus, ribozymes (e.g., hammerhead ribozymes as described in 
Haselhoff and Gerlach 1988. Nature 334: 585-591) can be used to catalytically cleave NOVX 
mRNA transcripts to thereby inhibit translation of NOVX mRNA. A ribozyme having 
specificity for a NOVX-encoding nucleic acid can be designed based upon the nucleotide 
5 sequence of a NOVX cDNA disclosed herein (i.e., SEQ ID NO:2a7-1, wherein n is an integer 
between 1 and 172). For example, a derivative of a Telrahymena L-19 IVS RNA can be 
constructed in which the nucleotide sequence of the active site is complementary to the 
nucleotide sequence to be cleaved in a NOVX-encoding mRNA. See, e.g., U.S. Patent 
4,987,071 to Cech, et ai and U.S. Patent 5, 11 6,742 to Cech, et al. NOVX mRNA can also be 

1 0 used to select a catalytic RNA having a specific ribonuclease activity from a pool of RNA 
molecules. See, e.g., Bartel et al., (1993) Science 261:141 1-1418. 

Alternatively, NOVX gene expression can be inhibited by targeting nucleotide 
sequences complementary to the regulatory region of the NOVX nucleic acid (e.g., the 
NOVX promoter and/or enhancers) to form triple helical structures that prevent transcription 

1 5 of the NOVX gene in target cells. See, e.g., Helene, 1 991 . Anticancer Drug Des. 6: 569-84; 
Hdene,etal. 1992. Ann. N.Y. Acad. Sci. 660: 27-36; Maher, 1992. Bioassays 14: 807-15. 

In various embodiments, the NOVX nucleic acids can be modified at the base moiety, 
sugar moiety or phosphate backbone to improve, e.g., the stability, hybridization, or solubility 
of the molecule. For example, the deoxyribose phosphate backbone of the nucleic acids can 

20 be modified to generate peptide nucleic acids. See, e.g., Hyrup, et ai, 1996. Bioorg Med 
Chem 4: 5-23. As used herein, the terms "peptide nucleic acids" or "PNAs" refer to nucleic 
acid mimics (e.g., DNA mimics) in which the deoxyribose phosphate backbone is replaced by 
a pseudopeptide backbone and only the four natural nucleotide bases are retained. The 
neutral backbone of PNAs has been shown to allow for specific hybridization to DNA and 

25 RNA under conditions of low ionic strength. The synthesis of PNA oligomer can be 

performed using standard solid phase peptide synthesis protocols as described in Hyrup, et 
al., 1996. supra; Perry-O'Keefe, et a/., 1996. Proc Natl. Acad Sci. USA 93: 14670-14675. 

PNAs of NOVX can be used in therapeutic and diagnostic applications. For example, 
PNAs can be used as antisense or antigene agents for sequence-specific modulation of gene 

30 expression by, e.g., inducing transcription or translation arrest or inhibiting replication. 

PNAs of NOVX can also be used, for example, in the analysis of single base pair mutations 
in a gene (e.g., PNA directed PCR clamping; as artificial restriction enzymes when used in 
combination with other enzymes, e.g., S ( nucleases (See, Hyrup, et al., \996.supra); or as 
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probes or primers for DNA sequence and hybridization (See, Hyrup, et al., 1996, supra; 
Perry-O'Keefe, et al., 1 996. supra). 

In another embodiment, PNAs ofNOVX can be modified, e.g., to enhance their 
stability or cellular uptake, by attaching lipophilic or other helper groups to PNA, by the 
5 formation of PNA-DNA chimeras, or by the use of liposomes or other techniques of drug 
delivery known in the art. For example, PNA-DNA chimeras ofNOVX can be generated 
that may combine the advantageous properties of PNA and DNA. Such chimeras allow DNA 
recognition enzymes (e.g., RNase H and DNA polymerases) to interact with the DNA portion 
while the PNA portion would provide high binding affinity and specificity. PNA-DNA 

10 chimeras can be linked using linkers of appropriate lengths selected in terms of base stacking, 
number of bonds between the nucleotide bases, and orientation (see, Hyrup, et ah, 1996. 
supra). The synthesis of PNA-DNA chimeras can be performed as described in Hyrup, et a/., 
1 996. supra and Finn, eta!., 1996. Nucl Acids Res 24: 3357-3363. For example, a DNA 
chain can be synthesized on a solid support using standard phosphoramidite coupling 

1 5 chemistry, and modified nucleoside analogs, e.g., 

5'-(4-methoxytrityl)amino-5 -deoxy-thymidine phosphoramidite, can be used between the 
PNA and the 5' end of DNA. See, e.g., Mag, et at., 1989. Nucl Acid Res 17: 5973-5988. 
PNA monomers are then coupled in a stepwise manner to produce a chimeric molecule with a 
5' PNA segment and a 3' DNA segment. See, e.g., Finn, et al., 1996. supra. Alternatively, 

20 chimeric molecules can be synthesized with a 5' DNA segment and a 3' PNA segment. See, 
e.g., Petersen, et al, 1975. Bioorg. Med. Chem. Lett. 5: 1 1 19-1 1 124. 

In other embodiments, the oligonucleotide may include other appended groups such 
as peptides (e.g., for targeting host cell receptors in vivo), or agents facilitating transport 
across the cell membrane (see, e.g., Letsinger, et al., 1989. Proc. Natl. Acad. Sci. U.S.A. 86: 

25 6553-6556; Lemaitre, et al, 1 987. Proc. Natl. Acad. Sci. 84: 648-652; PCT Publication No. 
WO88/09810) or the blood-brain barrier (see, e.g., PCT Publication No. WO 89/10134). In 
addition, oligonucleotides can be modified with hybridization triggered cleavage agents (see, 
e.g., Krol, et al., 1988. BioTechniques 6:958-976) or intercalating agents (see, e.g., Zon, 
1988. Pharm. Res. 5: 539-549). To this end, the oligonucleotide may be conjugated to 

30 another molecule, e.g., a peptide, a hybridization triggered cross-linking agent, a transport 
agent, a hybridization-triggered cleavage agent, and the like. 
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NOVX Polypeptides 

A polypeptide according to the invention includes a polypeptide including the amino 
acid sequence of NOVX polypeptides whose sequences are provided in any one of SEQ ID 
NO:2i?, wherein n is an integer between 1 and 172. The invention also includes a mutant or 
5 variant protein any of whose residues may be changed from the corresponding residues 
shown in any one of SEQ ID NO:2a?, wherein n is an integer between I and 172, while still 
encoding a protein that maintains its NOVX activities and physiological functions, or a 
functional fragment thereof. 

In general, a NOVX variant that preserves NOVX-like function includes any variant 

10 in which residues at a particular position in the sequence have been substituted by other 

amino acids, and further include the possibility of inserting an additional residue or residues 
between two residues of the parent protein as well as the possibility of deleting one or more 
residues from the parent sequence. Any amino acid substitution, insertion, or deletion is 
encompassed by the invention. In favorable circumstances, the substitution is a conservative 

1 5 substitution as defined above. 

One aspect of the invention pertains to isolated NOVX proteins, and 
biologically-active portions thereof, or derivatives, fragments, analogs or homologs thereof 
Also provided are polypeptide fragments suitable for use as immunogens to raise anti-NOVX 
antibodies. In one embodiment, native NOVX proteins can be isolated from cells or tissue 

20 sources by an appropriate purification scheme using standard protein purification techniques. 
In another embodiment, NOVX proteins are produced by recombinant DNA techniques. 
Alternative to recombinant expression, a NOVX protein or polypeptide can be synthesized 
chemically using standard peptide synthesis techniques. 

An "isolated" or "purified" polypeptide or protein or biologically-active portion 

25 thereof is substantially free of cellular material or other contaminating proteins from the cell 
or tissue source from which the NOVX protein is derived, or substantially free from chemical 
precursors or other chemicals when chemically synthesized. The language "substantially free 
of cellular material" includes preparations of NOVX proteins in which the protein is 
separated from cellular components of the cells from which it is isolated or 

30 recombinantly-produced. In one embodiment, the language "substantially free of cellular 
material" includes preparations of NOVX proteins having less than about 30% (by dry 
weight) of non-NOVX proteins (also referred to herein as a "contaminating protein"), more 
preferably less than about 20% of non-NOVX proteins, still more preferably less than about 

41 



WO 03/023002 ^^PCT/US02/28539 



10% of non-NOVX proteins, and most preferably less than about 5% of non-NOVX proteins. 
When the NOVX protein or biologically-active portion thereof is recombinantly-produced, it 
is also preferably substantially free of culture medium, i.e., culture medium represents less 
than about 20%, more preferably less than about 10%, and most preferably less than about 
5 5% of the volume of the NOVX protein preparation. 

The language "substantially free of chemical precursors or other chemicals" includes 
preparations of NOVX proteins in which the protein is separated from chemical precursors or 
other chemicals that are involved in the synthesis of the protein. In one embodiment, the 
language "substantially free of chemical precursors or other chemicals" includes preparations 
10 of NOVX proteins having less than about 30% (by dry weight) of chemical precursors or 
non-NOVX chemicals, more preferably less than about 20% chemical precursors or 
non-NOVX chemicals, still more preferably less than about 10% chemical precursors or 
non-NOVX chemicals, and most preferably less than about 5% chemical precursors or 
non-NOVX chemicals. 

15 Biologically-active portions of NOVX proteins include peptides comprising amino 

acid sequences sufficiently homologous to or derived from the amino acid sequences of the 
NOVX proteins (e.g., the amino acid sequence of SEQ ID NO:2a7, wherein n is an integer 
between I and 172) that include fewer amino acids than the full-length NOVX proteins, and 
exhibit at least one activity of a NOVX protein. Typically, biologically-active portions 

20 comprise a domain or motif with at least one activity of the NOVX protein. A 

biologically-active portion of a NOVX protein can be a polypeptide which is, for example, 
10, 25, 50, 100 or more amino acid residues in length. 

Moreover, other biologically-active portions, in which other regions of the protein are 
deleted, can be prepared by recombinant techniques and evaluated for one or more of the 

25 functional activities of a native NOVX protein. 

In an embodiment, the NOVX protein has an amino acid sequence of SEQ ID NO:2/7, 
wherein n is an integer between I and 1 72. In other embodiments, the NOVX protein is 
substantially homologous to SEQ ID NO:2/7, wherein n is an integer between I and 1 72, and 
retains the functional activity of the protein of SEQ ID NO:2/7, wherein n is an integer 

30 between I and 172, yet differs in amino acid sequence due to natural allelic variation or 

mutagenesis, as described in detail, below. Accordingly, in another embodiment, the NOVX 
protein is a protein that comprises an amino acid sequence at least about 45% homologous to 
the amino acid sequence of SEQ ID NO:2>7, wherein n is an integer between I and 1 72, and 
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retains the functional activity of the NOVX proteins of SEQ ID NO:2/7, wherein n is an 
integer between 1 and 172. 

Determining Homology Between Two or More Sequences 

To determine the percent homology of two amino acid sequences or of two nucleic 
5 acids, the sequences are aligned for optimal comparison purposes {e.g., gaps can be 
introduced in the sequence of a first amino acid or nucleic acid sequence for optimal 
alignment with a second amino or nucleic acid sequence). The amino acid residues or 
nucleotides at corresponding amino acid positions or nucleotide positions are then compared. 
When a position in the first sequence is occupied by the same amino acid residue or 

10 nucleotide as the corresponding position in the second sequence, then the molecules are 
homologous at that position {i.e., as used herein amino acid or nucleic acid "homology" is 
equivalent to amino acid or nucleic acid "identity"). 

The nucleic acid sequence homology may be determined as the degree of identity 
between two sequences. The homology may be determined using computer programs known 

15 in the art, such as GAP software provided in the GCG program package. See, Needleman 
and Wunsch, 1970. J Mol Biol 48: 443-453. Using GCG GAP software with the following 
settings for nucleic acid sequence comparison: GAP creation penalty of 5.0 and GAP 
extension. penalty of 0.3, the coding region of the analogous nucleic acid sequences referred 
to above exhibits a degree of identity preferably of at least 70%, 75%, 80%, 85%, 90%, 95%, 

20 98%, or 99%, with the CDS (encoding) part of the DNA sequence of SEQ ID NO:2/?-l , 
wherein n is an integer between 1 and 172. 

The term "sequence identity" refers to the degree to which two polynucleotide or 
polypeptide sequences are identical on a residue-by-residue basis over a particular region of 
comparison. The term "percentage of sequence identity 77 is calculated by comparing two 

25 optimally aligned sequences over that region of comparison, determining the number of 
positions at which the identical nucleic acid base {e.g., A, T, C, G, U, or I, in the case of 
nucleic acids) occurs in both sequences to yield the number of matched positions, dividing 
the number of matched positions by the total number of positions in the region of comparison 
{i.e.. the window size), and multiplying the result by 100 to yield the percentage of sequence 

30 identity. The term "substantial identity" as used herein denotes a characteristic of a 

polynucleotide sequence, wherein the polynucleotide comprises a sequence that has at least 
80 percent sequence identity, preferably at least 85 percent identity and often 90 to 95 percent 
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sequence identity, more usually at least 99 percent sequence identity as compared to a 
reference sequence over a comparison region. 

Chimeric and Fusion Proteins 

The invention also provides NOVX chimeric or fusion proteins. As used herein, a 
5 NOVX "chimeric protein" or "fusion protein" comprises a NOVX polypeptide 

operatively-linked to a non-NOVX polypeptide. An "NOVX polypeptide" refers to a 
polypeptide having an amino acid sequence corresponding to a NOVX protein of SEQ ID 
NO:2/?, wherein n is an integer between I and 172, whereas a "non-NOVX polypeptide" 
refers to a polypeptide having an amino acid sequence corresponding to a protein that is not 

10 substantially homologous to the NOVX protein, e.g., a protein that is different from the 
NOVX protein and that is derived from the same or a different organism. Within a NOVX 
fusion protein the NOVX polypeptide can correspond to all or a portion of a NOVX protein. 
In one embodiment, a NOVX fusion protein comprises at least one biologically-active ^ 
portion of a NOVX protein. In another embodiment, a NOVX fusion protein comprises at 

1 5 least two biologically-active portions of a NOVX protein. In yet another embodiment, a 
NOVX fusion protein comprises at least three biologically-active portions of a NOVX 
protein. Within the fusion protein, the term "operatively-linked" is intended to indicate that 
the NOVX polypeptide and the non-NOVX polypeptide are fused in-frame with one another. 
The non-NOVX polypeptide can be fused to the N-terminus or C-terminus of the NOVX 

20 polypeptide. 

In one embodiment, the fusion protein is a GST-NOVX fusion protein in which the 
NOVX sequences are fused to the C-terminus of the GST (glutathione S-transferase) 
sequences. Such fusion proteins can facilitate the purification of recombinant NOVX 
polypeptides. 

25 In another embodiment, the fusion protein is a NOVX protein containing a 

heterologous signal sequence at its N-terminus. In certain host cells (e.g., mammalian host 
cells), expression and/or secretion of NOVX can be increased through use of a heterologous 
signal sequence. 

In yet another embodiment, the fusion protein is a NOVX-immunoglobulin fusion 
30 protein in which the NOVX sequences are fused to sequences derived from a member of the 
immunoglobulin protein family. The NOVX-immunoglobulin fusion proteins of the 
invention can be incorporated into pharmaceutical compositions and administered to a subject 
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to inhibit an interaction between a NOVX ligand and a NOVX protein on the surface of a 
ceil, to thereby suppress NOVX-mediated signal transduction in vivo. The 
NOVX-immunoglobulin fusion proteins can be used to affect the bioavailability of a NOVX 
cognate ligand. Inhibition of the NOVX ligand/NOVX interaction may be useful 
5 therapeutically for both the treatment of proliferative and differentiative disorders, as well as 
modulating (e.g. promoting or inhibiting) cell survival. Moreover, the 
NOVX-immunoglobulin fusion proteins of the invention can be used as immunogens to 
produce anti-NOVX antibodies in a subject, to purify NOVX ligands, and in screening assays 
to identify molecules that inhibit the interaction of NOVX with a NOVX ligand. 

10 A NOVX chimeric or fusion protein of the invention can be produced by standard 

recombinant DNA techniques. For example, DNA fragments coding for the different 
polypeptide sequences are ligated together in-frame in accordance with conventional 
techniques, e.g., by employing blunt-ended or stagger-ended termini for ligation, restriction 
enzyme digestion to provide for appropriate termini, filling-in of cohesive ends as 

15 appropriate, alkaline phosphatase treatment to avoid undesirable joining, and enzymatic 
ligation. In another embodiment, the fusion gene can be synthesized by conventional 
techniques including automated DNA synthesizers. Alternatively, PCR amplification of gene 
fragments can be carried out using anchor primers that give rise to complementary overhangs 
between two consecutive gene fragments that can subsequently be annealed and reamplified 

20 to generate a chimeric gene sequence (see, e.g., Ausubel, et aL (eds.) CURRENT Protocols in 
Molecular Biology, John Wiley & Sons, 1992). Moreover, many expression vectors are 
commercially available that already encode a fusion moiety (e.g., a GST polypeptide). A 
NOVX-encoding nucleic acid can be cloned into such an expression vector such that the 
fusion moiety is linked in-frame to the NOVX protein. 

25 NOVX Agonists and Antagonists 

The invention also pertains to variants of the NOVX proteins that function as either 
NOVX agonists (i.e., mimetics) or as NOVX antagonists. Variants of the NOVX protein can 
be generated by mutagenesis (e.g., discrete point mutation or truncation of the NOVX 
protein). An agonist of the NOVX protein can retain substantially the same, or a subset of, 
30 the biological activities of the naturally occurring form of the NOVX protein. An antagonist 
of the NOVX protein can inhibit one or more of the activities of the naturally occurring form 
of the NOVX protein by, for example, competitively binding to a downstream or upstream 
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member of a cellular signaling cascade which includes the NOVX protein. Thus, specific 
biological effects can be elicited by treatment with a variant of limited function. In one 
embodiment, treatment of a subject with a variant having a subset of the biological activities 
of the naturally occurring form of the protein has fewer side effects in a subject relative to 
5 treatment with the naturally occurring form of the NOVX proteins. 

Variants of the NOVX proteins that function as either NOVX agonists (i.e.. mimetics) 
or as NOVX antagonists can be identified by screening combinatorial libraries of mutants 
(e.g., truncation mutants) of the NOVX proteins for NOVX protein agonist or antagonist 
activity. In one embodiment, a variegated library of NOVX variants is generated by 

10 combinatorial mutagenesis at the nucleic acid level and is encoded by a variegated gene 
library. A variegated library of NOVX variants can be produced by, for example, 
enzymatically ligating a mixture of synthetic oligonucleotides into gene sequences such that a 
degenerate set of potential NOVX sequences is expressible as individual polypeptides, or 
alternatively, as a set of larger fusion proteins (e.g., for phage display) containing the set of 

15 NOVX sequences therein. There are a variety of methods which can be used to produce 

libraries of potential NOVX variants from a degenerate oligonucleotide sequence. Chemical 
synthesis of a degenerate gene sequence can be performed in an automatic DNA synthesizer, 
and the synthetic gene then ligated into an appropriate expression vector. Use of a degenerate 
set of genes allows for the provision, in one mixture, of all of the sequences encoding the 

20 desired set of potential NOVX sequences. Methods for synthesizing degenerate 

oligonucleotides are well-known within the art. See, e.g., Narang, 1983. Tetrahedron 39: 3; 
Itakura, et aL, \9U.Anmi. Rev. Biochem. 53: 323; Itakura, el ai, 1984. Science 198: 1056; 
Ike, et aL, 1 983. Nuci Acids Res. 1 1 : 477. 



to generate a variegated population of NOVX fragments for screening and subsequent 
selection of variants of a NOVX protein. In one embodiment, a library of coding sequence 
fragments can be generated by treating a double stranded PCR fragment of a NOVX coding 
sequence with a nuclease under conditions wherein nicking occurs only about once per 
30 molecule, denaturing the double stranded DNA, renaturing the DNA to form double-stranded 
DNA that can include sense/antisense pairs from different nicked products, removing single 
stranded portions from reformed duplexes by treatment with Si nuclease, and ligating the 



Polypeptide Libraries 



25 



In addition, libraries of fragments of the NOVX protein coding sequences can be used 
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resulting fragment library into an expression vector. By this method, expression libraries can 
be derived which encodes N-terminal and internal fragments of various sizes of the NOVX 
proteins. 

Various techniques are known in the art for screening gene products of combinatorial 
5 libraries made by point mutations or truncation, and for screening cDNA libraries for gene 
products having a selected property. Such techniques are adaptable for rapid screening of the 
gene libraries generated by the combinatorial mutagenesis of NOVX proteins. The most 
widely used techniques, which are amenable to high throughput analysis, for screening large 
gene libraries typically include cloning the gene library into replicable expression vectors, 

1 0 transforming appropriate cells with the resulting library of vectors, and expressing the 
combinatorial genes under conditions in which detection of a desired activity facilitates 
isolation of the vector encoding the gene whose product was detected. Recursive ensemble 
mutagenesis (REM), a new technique that enhances the frequency of functional mutants in 
the libraries, can be used in combination with the screening assays to identify NOVX 

1 5 variants. See, e.g., Arkin and Yourvan, 1 992. Proc. Natl. Acad. Sci. USA 89: 78 1 1 -78 1 5; 
Delgrave, et aL, 1993. Protein Engineering 6:327-33 1 . 

Anti-NOVX Antibodies 

Included in the invention are antibodies to NOVX proteins, or fragments of NOVX 
proteins. The term "antibody" as used herein refers to immunoglobulin molecules and 

20 immunologically active portions of immunoglobulin (Ig) molecules, i.e., molecules that 

contain an antigen binding site that specifically binds (immunoreacts with) an antigen. Such 
antibodies include, but are not limited to, polyclonal, monoclonal, chimeric, single chain, F ab , 
Fab* and F (a b)2 fragments, and an F a b expression library. In general, antibody molecules 
obtained from humans relates to any of the classes IgG, IgM, IgA, IgE and IgD, which differ 

25 from one another by the nature of the heavy chain present in the molecule. Certain classes 
have subclasses as well, such as Igd, IgG2, and others. Furthermore, in humans, the light 
chain may be a kappa chain or a lambda chain. Reference herein to antibodies includes a 
reference to all such classes, subclasses and types of human antibody species. 

An isolated protein of the invention intended to serve as an antigen, or a portion or 

30 fragment thereof, can be used as an immunogen to generate antibodies that 

immunospecifically bind the antigen, using standard techniques for polyclonal and 
monoclonal antibody preparation. The full-length protein can be used or, alternatively, the 
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invention provides antigenic peptide fragments of the antigen for use as immunogens. An 
antigenic peptide fragment comprises at least 6 amino acid residues of the amino acid 
sequence of the full length protein, such as an amino acid sequence of SEQ ID NO:2/7, 
wherein n is an integer between 1 and 172, and encompasses an epitope thereof such that an 
5 antibody raised against the peptide forms a specific immune complex with the full length 
protein or with any fragment that contains the epitope. Preferably, the antigenic peptide 
comprises at least 10 amino acid residues, or at least 15 amino acid residues, or at least 20 
amino acid residues, or at least 30 amino acid residues. Preferred epitopes encompassed by 
the antigenic peptide are regions of the protein that are located on its surface; commonly 
1 0 these are hydrophilic regions. 

In certain embodiments of the invention, at least one epitope encompassed by the 
antigenic peptide is a region of NOVX that is located on the surface of the protein, e.g., a 
hydrophilic region. A hydrophobicity analysis of the human NOVX protein sequence will 
indicate which regions of a NOVX polypeptide are particularly hydrophilic and, therefore, 

1 5 are likely to encode surface residues useful for targeting antibody production. As a means for 
targeting antibody production, hydropathy plots showing regions of hydrophilicity and 
hydrophobicity may be generated by any method well known in the art, including, for 
example, the Kyte Doolittle or the Hopp Woods methods, either with or without Fourier 
transformation. See, e.g., Hopp and Woods, 1981, Proc. Nat. Acad. Sci. USA 78: 3824-3828; 

20 Kyte and Doolittle 1982,7. Mo!. Biol. 157: 105-142, each incorporated herein by reference in 
their entirety. Antibodies that are specific for one or more domains within an antigenic 
protein, or derivatives, fragments, analogs or homologs thereof, are also provided herein. 

The term "epitope" includes any protein determinant capable of specific binding to an 
immunoglobulin or T-cell receptor. Epitopic determinants usually consist of chemically 

25 active surface groupings of molecules such as amino acids or sugar side chains and usually 
have specific three dimensional structural characteristics, as well as specific charge 
characteristics. A NOVX polypeptide or a fragment thereof comprises at least one antigenic 
epitope. An anti-NOVX antibody of the present invention is said to specifically bind to 
antigen NOVX when the equilibrium binding constant (K D ) is <l |uM, preferably < 100 nM, 

30 more preferably < 10 nM, and most preferably < 100 pM to about I pM, as measured by 

assays such as radioligand binding assays or similar assays known to those skilled in the art. 
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A protein of the invention, or a derivative, fragment, analog, homolog or ortholog 
thereof, may be utilized as an immunogen in the generation of antibodies that 
immunospecifically bind these protein components. 

Various procedures known within the art may be used for the production of 
5 polyclonal or monoclonal antibodies directed against a protein of the invention, or against 
derivatives, fragments, analogs homologs or orthologs thereof (see, for example, Antibodies: 
A Laboratory Manual, Harlow E, and Lane D, 1988, Cold Spring Harbor Laboratory Press, 
Cold Spring Harbor, NY, incorporated herein by reference). Some of these antibodies are 
discussed below. 

1 0 Polyclonal Antibodies 

For the production of polyclonal antibodies, various suitable host animals (e.g., rabbit, 
goat, mouse or other mammal) may be immunized by one or more injections with the native 
protein, a synthetic variant thereof, or a derivative of the foregoing. An appropriate 
immunogenic preparation can contain, for example, the naturally occurring immunogenic 

1 5 protein, a chemically synthesized polypeptide representing the immunogenic protein, or a 
recombinantly expressed immunogenic protein. Furthermore, the protein may be conjugated 
to a second protein known to be immunogenic in the mammal being immunized. Examples 
of such immunogenic proteins include but are not limited to keyhole limpet hemocyanin, 
serum albumin, bovine thyroglobulin, and soybean trypsin inhibitor. The preparation can 

20 further include an adjuvant. Various adjuvants used to increase the immunological response 
include, but are not limited to, Freund's (complete and incomplete), mineral gels (e.g., 
aluminum hydroxide), surface active substances (e.g., lysolecithin, pluronic polyols, 
polyanions, peptides, oil emulsions, dinitrophenol, etc.), adjuvants usable in humans such as 
Bacille Calmette-Guerin and Corynebacterium parvum, or similar immunostimulatory agents. 

25 Additional examples of adjuvants which can be employed include MPL-TDM adjuvant 
(monophosphoryl Lipid A, synthetic trehalose dicorynomycolate). 

The polyclonal antibody molecules directed against the immunogenic protein can be 
isolated from the mammal (e.g., from the blood) and further purified by well known 
techniques, such as affinity chromatography using protein A or protein G, which provide 

30 primarily the IgG fraction of immune serum. Subsequently, or alternatively, the specific 
antigen which is the target of the immunoglobulin sought, or an epitope thereof, may be 
immobilized on a column to purify the immune specific antibody by immunoaffinity 
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chromatography. Purification of immunoglobulins is discussed, for example, by D. 
Wilkinson (The Scientist, published by The Scientist, Inc., Philadelphia PA, Vol. 14, No. 8 
(April 17, 2000), pp. 25-28). 

Monoclonal Antibodies 

5 The term "monoclonal antibody' 1 (MAb) or "monoclonal antibody composition", as 

used herein, refers to a population of antibody molecules that contain only one molecular 
species of antibody molecule consisting of a unique light chain gene product and a unique 
heavy chain gene product. In particular, the complementarity determining regions (CDRs) of 
the monoclonal antibody are identical in all the molecules of the population. MAbs thus 

1 0 contain an antigen binding site capable of immunoreacting with a particular epitope of the 
antigen characterized by a unique binding affinity for it. 

Monoclonal antibodies can be prepared using hybridoma methods, such as those 
described by Kohler and Milstein, Nature, 256:495 (1 975). In a hybridoma method, a.yiouse, 
hamster, or other appropriate host animal, is typically immunized with an immunizing agent 

15 to elicit lymphocytes that produce or are capable of producing antibodies that will 

specifically bind to the immunizing agent. Alternatively, the lymphocytes can be immunized 
in vitro. 

The immunizing agent will typically include the protein antigen, a fragment thereof 
or a fusion protein thereof. Generally, either peripheral blood lymphocytes are used if cells 

20 of human origin are desired, or spleen cells or lymph node cells are used if non-human 

mammalian sources are desired. The lymphocytes are then fused with an immortalized cell 
line using a suitable fusing agent, such as polyethylene glycol, to form a hybridoma cell 
(Goding, Monoclonal Antibodies: Principles and Practice , Academic Press, (1986) pp. 
59-103). Immortalized cell lines are usually transformed mammalian cells, particularly 

25 myeloma cells of rodent, bovine and human origin. Usually, rat or mouse myeloma cell lines 
are employed. The hybridoma cells can be cultured in a suitable culture medium that 
preferably contains one or more substances that inhibit the growth or survival of the unfused, 
immortalized cells. For example, if the parental cells lack the enzyme hypoxanthine guanine 
phosphoribosyl transferase (HGPRT or HPRT), the culture medium for the hybridomas 

30 typically will include hypoxanthine, aminopterin, and thymidine ("HAT medium"), which 
substances prevent the growth of HGPRT-deficient cells. 
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Preferred immortalized cell lines are those that fuse efficiently, support stable high 
level expression of antibody by the selected antibody-producing cells, and are sensitive to a 
medium such as HAT medium. More preferred immortalized cell lines are murine myeloma 
lines, which can be obtained, for instance, from the Salk Institute Cell Distribution Center, 
5 San Diego, California and the American Type Culture Collection, Manassas, Virginia. 

Human myeloma and mouse-human heteromyeloma cell lines also have been described for 
the production of human monoclonal antibodies (Kozbor, J. Immunol., 133:3001 (1984); 
Brodeur et al., Monoclonal Antibody Production Techniques and Applications, Marcel 
Dekker, Inc., New York, (1987) pp. 51-63). 

1 0 The culture medium in which the hybridoma cells are cultured can then be assayed for 

the presence of monoclonal antibodies directed against the antigen. Preferably, the binding 
specificity of monoclonal antibodies produced by the hybridoma cells is determined by 
immunoprecipitation or by an in vitro binding assay, such as radioimmunoassay (RI A) or 
enzyme-linked immunoabsorbent assay (ELISA). Such techniques and assays are known in 

1 5 the art. The binding affinity of the monoclonal antibody can, for example, be determ ined by 
the Scatchard analysis of Munson and Pollard, Anal. Biochem., 107:220 (1980). It is an 
objective, especially important in therapeutic applications of monoclonal antibodies, to 
identify antibodies having a high degree of specificity and a high binding affinity for the 
target antigen. 

20 After the desired hybridoma cells are identified, the clones can be subcloned by 

limiting dilution procedures and grown by standard methods (Goding, 1986). Suitable culture 
media for this purpose include, for example, Dulbecco's Modified Eagle's Medium and 
RPMI-1640 medium. Alternatively, the hybridoma cells can be grown in vivo as ascites in a 
mammal. 

25 The monoclonal antibodies secreted by the subclones can be isolated or purified from 

the culture medium or ascites fluid by conventional immunoglobulin purification procedures 
such as, for example, protein A-Sepharose, hydroxylapatite chromatography, gel 
electrophoresis, dialysis, or affinity chromatography. 

The monoclonal antibodies can also be made by recombinant DNA methods, such as 

30 those described in U.S. Patent No. 4,816,567. DNA encoding the monoclonal antibodies of 
the invention can be readily isolated and sequenced using conventional procedures (e.g., by 
using oligonucleotide probes that are capable of binding specifically to genes encoding the 
heavy and light chains of murine antibodies). The hybridoma cells of the invention serve as a 
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preferred source of such DNA. Once isolated, the DNA can be placed into expression 
vectors, which are then transfected into host cells such as simian COS cells, Chinese hamster 
ovary (CHO) cells, or myeloma cells that do not otherwise produce immunoglobulin protein, 
to obtain the synthesis of monoclonal antibodies in the recombinant host cells. The DNA 
5 also can be modified, for example, by substituting the coding sequence for human heavy and 
light chain constant domains in place of the homologous murine sequences (U.S. Patent No. 
4,816,567; Morrison, Nature 368, 812-13 (1994)) or by covalently joining to the 
immunoglobulin coding sequence all or part of the coding sequence for a 
non-immunoglobulin polypeptide. Such a non-immunoglobulin polypeptide can be 
1 0 substituted for the constant domains of an antibody of the invention, or can be substituted for 
the variable domains of one antigen-combining site of an antibody of the invention to create a 
chimeric bivalent antibody. 

Humanized Antibodies 

The antibodies directed against the protein antigens of the invention can further 

1 5 comprise humanized antibodies or human antibodies. These antibodies are suitable for 

administration to humans without engendering an immune response by the human against the 
administered immunoglobulin. Humanized forms of antibodies are chimeric 
immunoglobulins, immunoglobulin chains or fragments thereof (such as Fv, Fab, Fab 1 , F(ab') 2 
or other antigen-binding subsequences of antibodies) that are principally comprised of the 

20 sequence of a human immunoglobulin, and contain minimal sequence derived from a 
non-human immunoglobulin. Humanization can be performed following the method of 
Winter and co-workers (Jones et aL, Nature, 321 :522-525 (1986); Riechmann et al. s Nature, 
332:323-327(1988); Verhoeyen etal. Science, 239: 1534-1536 (1988)), by substituting 
rodent CDRs or CDR sequences for the corresponding sequences of a human antibody. (See 

25 also U.S. Patent No. 5,225,539.) In some instances, Fv framework residues of the human 
immunoglobulin are replaced by corresponding non-human residues. Humanized antibodies 
can also comprise residues which are found neither in the recipient antibody nor in the 
imported CDR or framework sequences. In general, the humanized antibody will comprise 
substantially all of at least one, and typically two, variable domains, in which all or 

30 substantially all of the CDR regions correspond to those of a non-human immunoglobulin 
and all or substantially all of the framework regions are those of a human immunoglobulin 
consensus sequence. The humanized antibody optimally also will comprise at least a portion 
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of an immunoglobulin constant region (Fc), typically that of a human immunoglobulin (Jones 
et al., 1 986; Riechmann et al., i 988; and Presta, Curr. Op. Struct. Biol., 2:593-596 ( 1 992)). 

Human Antibodies 

Fully human antibodies essentially relate to antibody molecules in which the entire 
5 sequence of both the light chain and the heavy chain, including the CDRs, arise from human 
genes. Such antibodies are termed "human antibodies 1 ', or "fully human antibodies" herein. 
Human monoclonal antibodies can be prepared by the trioma technique; the human B-cell 
hybridoma technique (see Kozbor, et ah, 1983 Immunol Today 4: 72) and the EBV 
hybridoma technique to produce human monoclonal antibodies (see Cole, et al., 1 985 In: 

10 Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). Human 
monoclonal antibodies may be utilized in the practice of the present invention and may be 
produced by using human hybridomas (see Cote, et al., 1983. Proc Natl Acad Sci USA 80: 
2026-2030) or by transforming human B-cells with Epstein Barr Virus in vitro (see Cole, et 
al., 1985 In: MONOCLONAL ANTIBODIES AND Cancer Therapy, Alan R. Liss, Inc., pp. 

15 77-96). 

In addition, human antibodies can also be produced using additional techniques, 
including phage display libraries (Hoogenboom and Winter, J. Mol. Biol., 227:381 (1991); 
Marks et al., J. Mol. Biol., 222:581 (1991)). Similarly, human antibodies can be made by 
introducing human immunoglobulin loci into transgenic animals, e.g., mice in which the 

20 endogenous immunoglobulin genes have been partially or completely inactivated. Upon 
challenge, human antibody production is observed, which closely resembles that seen in 
humans in all respects, including gene rearrangement, assembly, and antibody repertoire. 
This approach is described, for example, in U.S. Patent Nos. 5,545,807; 5,545,806; 
5,569,825; 5,625, 1 26; 5,633,425; 5,66 1 ,0 1 6, and in Marks et al. (Bio/Technology 1 0, 

25 779-783 ( 1 992)); Lonberg et ah (Nature 368 856-859 ( 1 994)); Morrison ( Nature 368, 
812-13 (1994)); Fishwild et al,( Nature Biotechnology 14, 845-5 1 (1996)); Neuberger 
(Nature Biotechnology 14, 826 (1996)); and Lonberg and Huszar (Intern. Rev. Immunol. 13 
65-93 (1995)). 

Human antibodies may additionally be produced using transgenic nonhuman animals 
30 which are modified so as to produce fully human antibodies rather than the animal's 
endogenous antibodies in response to challenge by an antigen. (See PCT publication 
WO94/02602). The endogenous genes encoding the heavy and light immunoglobulin chains 
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in the nonhuman host have been incapacitated, and active loci encoding human heavy and 
light chain immunoglobulins are inserted into the host's genome. The human genes are 
incorporated, for example, using yeast artificial chromosomes containing the requisite human 
DNA segments. An animal which provides all the desired modifications is then obtained as 
5 progeny by crossbreeding intermediate transgenic animals containing fewer than the full 
complement of the modifications. The preferred embodiment of such a nonhuman animal is 
a mouse, and is termed the Xenomouse™ as disclosed in PCT publications WO 96/33735 
and WO 96/34096. This animal produces B cells which secrete fully human 
immunoglobulins. The antibodies can be obtained directly from the animal after 

1 0 immunization with an immunogen of interest, as, for example, a preparation of a polyclonal 
antibody, or alternatively from immortalized B cells derived from the animal, such as 
hybridomas producing monoclonal antibodies. Additionally, the genes encoding the 
immunoglobulins with human variable regions can be recovered and expressed to obtain the 
antibodies directly, or can be further modified to obtain analogs of antibodies such as, for 

1 5 example, single chain Fv molecules. 

An example of a method of producing a nonhuman host, exemplified as a mouse, 
lacking expression of an endogenous immunoglobulin heavy chain is disclosed in U.S. Patent 
No. 5,939,598. It can be obtained by a method including deleting the J segment genes from 
at least one endogenous heavy chain locus in an embryonic stem cell to prevent 
20 rearrangement of the locus and to prevent formation of a transcript of a rearranged 
immunoglobulin heavy chain locus, the deletion being effected by a targeting vector 
containing a gene encoding a selectable marker; and producing from the embryonic stem cell 
a transgenic mouse whose somatic and germ cells contain the gene encoding the selectable 
marker. 

25 A method for producing an antibody of interest, such as a human antibody, is 

disclosed in U.S. Patent No. 5,9 1 6,77 1 . It includes introducing an expression vector that 
contains a nucleotide sequence encoding a heavy chain into one mammalian host cell in 
culture, introducing an expression vector containing a nucleotide sequence encoding a light 
chain into another mammalian host cell, and fusing the two cells to form a hybrid cell. The 

30 hybrid cell expresses an antibody containing the heavy chain and the light chain. 

In a further improvement on this procedure, a method for identifying a clinically 
relevant epitope on an immunogen, and a correlative method for selecting an antibody thai 
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binds immunospeciflcally to the relevant epitope with high affinity, are disclosed in PCT 
publication WO 99/53049. 

Fab Fragments and Single Chain Antibodies 

According to the invention, techniques can be adapted for the production of 
5 single-chain antibodies specific to an antigenic protein of the invention (see e.g., U.S. Patent 
No. 4,946,778). In addition, methods can be adapted for the construction of F ab expression 
libraries (see e.g., Huse, et al., 1989 Science 246: 1275-128 1) to allow rapid and effective 
identification of monoclonal F ab fragments with the desired specificity for a protein or 
derivatives, fragments, analogs or homologs thereof. Antibody fragments that contain the 
10 idiotypes to a protein antigen may be produced by techniques known in the art including, but 
not limited to: (i) an F (a b')2 fragment produced by pepsin digestion of an antibody molecule; 
(ii) an F a b fragment generated by reducing the disulfide bridges of an F( a b'>2 fragment; (iii) an 
F ab fragment generated by the treatment of the antibody molecule with papain and a reducing 
agent and (iv) F v fragments. 

1 5 Bispecific Antibodies 

Bispecific antibodies are monoclonal, preferably human or humanized, antibodies that 
have binding specificities for at least two different antigens. In the present case, one of the 
binding specificities is for an antigenic protein of the invention. The second binding target is 
any other antigen, and advantageously is a cell-surface protein or receptor or receptor 
20 subunit. 

Methods for making bispecific antibodies are known in the art. Traditionally, the 
recombinant production of bispecific antibodies is based on the co-expression of two 
immunoglobulin heavy-chain/light-chain pairs, where the two heavy chains have different 
specificities (Milstein and Cuello, Nature, 305:537-539 (1983)). Because of the random 

25 assortment of immunoglobulin heavy and light chains, these hybridomas (quadromas) 

produce a potential mixture often different antibody molecules, of which only one has the 
correct bispecific structure. The purification of the correct molecule is usually accomplished 
by affinity chromatography steps. Similar procedures are disclosed in WO 93/08829, 
published 13 May 1993, and in Traunecker et al, EMBO J, 10:3655-3659 (1991). 

30 Antibody variable domains with the desired binding specificities (antibody-antigen 

combining sites) can be fused to immunoglobulin constant domain sequences. The fusion 
preferably is with an immunoglobulin heavy-chain constant domain, comprising at least part 
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of the hinge, CH2, and CH3 regions. It is preferred to have the first heavy-chain constant 
region (CH I ) containing the site necessary for light-chain binding present in at least one of 
the fusions. DNAs encoding the immunoglobulin heavy-chain fusions and, if desired, the 
immunoglobulin light chain, are inserted into separate expression vectors, and are 
5 co-transfected into a suitable host organism. For further details of generating bispecific 
antibodies see, for example, Suresh et al., Methods in Enzymology, 121:210 (1 986). 

According to another approach described in WO 96/2701 1, the interface between a 
pair of antibody molecules can be engineered to maximize the percentage of heterodimers 
which are recovered from recombinant cell culture. The preferred interface comprises at least 

10 a part of the CH3 region of an antibody constant domain. In this method, one or more small 
amino acid side chains from the interface of the first antibody molecule are replaced with 
larger side chains (e.g. tyrosine or tryptophan). Compensatory "cavities" of identical or 
similar size to the large side chain(s) are created on the interface of the second antibody 
molecule by replacing large amino acid side chains with smaller ones (e.g. alanine or ® 

1 5 threonine). This provides a mechanism for increasing the yield of the heterodimer over other 
unwanted end-products such as homodimers. 

Bispecific antibodies can be prepared as full length antibodies or antibody fragments 
(e.g. F(ab r )2 bispecific antibodies). Techniques for generating bispecific antibodies from 
antibody fragments have been described in the literature. For example, bispecific antibodies 

20 can be prepared using chemical linkage. Brennan et al., Science 229:8 1 ( 1 985) describe a 

procedure wherein intact antibodies are proteolytically cleaved to generate F(ab ? ) 2 fragments. 
These fragments are reduced in the presence of the dithiol complexing agent sodium arsenite 
to stabilize vicinal dithiols and prevent intermodular disulfide formation. The Fab ? 
fragments generated are then converted to thionitrobenzoate (TNB) derivatives. One of the 

25 Fab'-TNB derivatives is then reconverted to the Fab'-thiol by reduction with 

mercaptoethylamine and is mixed with an equimolar amount of the other Fab'-TNB 
derivative to form the bispecific antibody. The bispecific antibodies produced can be used as 
agents for the selective immobilization of enzymes. 

Additionally, Fab' fragments can be directly recovered from E. coli and chemically 

30 coupled to form bispecific antibodies. Shalaby et al., J. Exp. Med. 1 75:2 1 7-225 ( 1 992) 
describe the production of a fully humanized bispecific antibody F(ab') 2 molecule. Each 
Fab' fragment was separately secreted from E. coli and subjected to directed chemical 
coupling in vitro to form the bispecific antibody. The bispecific antibody thus formed was 
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able to bind to cells overexpressing the ErbB2 receptor and normal human T cells, as well as 
trigger the lytic activity of human cytotoxic lymphocytes against human breast tumor targets. 

Various techniques for making and isolating bispecific antibody fragments directly 
from recombinant cell culture have also been described. For example, bispecific antibodies 
5 have been produced using leucine zippers. Kostelny et al., J. Immunol. 148(5): 1 547- 1 553 
(1992). The leucine zipper peptides from the Fos and Jun proteins were linked to the Fab' 
portions of two different antibodies by gene fusion. The antibody homodimers were reduced 
at the hinge region to form monomers and then re-oxidized to form the antibody 
heterodimers. This method can also be utilized for the production of antibody homodimers. 

10 The "diabody" technology described by Hollinger et al., Proc. Natl. Acad. Sci. USA 

90:6444-6448 (1993) has provided an alternative mechanism for making bispecific antibody 
fragments. The fragments comprise a heavy-chain variable domain (V H ) connected to a 
light-chain variable domain (V L ) by a linker which is too short to allow pairing between the 
two domains on the same chain. Accordingly, the V H and V L domains of one fragment are 

1 5 forced to pair with the complementary V L and V H domains of another fragment, thereby 
forming two antigen-binding sites. Another strategy for making bispecific antibody 
fragments by the use of single-chain Fv (sFv) dimers has also been reported. See, Gruber et 
al., J. Immunol. 152:5368 (1994). 

Antibodies with more than two valencies are contemplated. For example, trispecific 

20 antibodies can be prepared. Tutt et al., J. Immunol. 147:60(1991). 

Exemplary bispecific antibodies can bind to two different epitopes, at least one of 
which originates in the protein antigen of the invention. Alternatively, an anti-antigenic arm 
of an immunoglobulin molecule can be combined with an arm which binds to a triggering 
molecule on a leukocyte such as a T-cell receptor molecule (e.g. CD2, CD3, CD28, or B7), or 

25 Fc receptors for IgG (FcyR), such as FcyRl (CD64), FcyRII (CD32) and FcyRIII (CD 16) so 
as to focus cellular defense mechanisms to the cell expressing the particular antigen. 
Bispecific antibodies can also be used to direct cytotoxic agents to cells which express a 
particular antigen. These antibodies possess an antigen-binding arm and an arm which binds 
a cytotoxic agent or a radionuclide chelator, such as EOTUBE, DPTA, DOTA, or TETA. 

30 Another bispecific antibody of interest binds the protein antigen described herein and further 
binds tissue factor (TF). 



57 



WO 03/023002 




T/US02/28539 



Heteroconjugate Antibodies 

Heteroconjugate antibodies are also within the scope of the present invention. 
Heteroconjugate antibodies are composed of two covalently joined antibodies. Such 
antibodies have, for example, been proposed to target immune system cells to unwanted ceils 
5 (U.S. Patent No. 4,676,980), and for treatment of HIV infection (WO 91/00360; WO 

92/200373; EP 03089). It is contemplated that the antibodies can be prepared in vitro using 
known methods in synthetic protein chemistry, including those involving crossl inking agents. 
For example, immunotoxins can be constructed using a disulfide exchange reaction or by 
formjng a thioether bond. Examples of suitable reagents for this purpose include 
10 iminothiolate and methyl-4-mercaptobutyrimidate and those disclosed, for example, in U.S. 
Patent No. 4,676,980. 

Effector Function Engineering 

It can be desirable to modify the antibody of the invention with respect to effector 
function, so as to enhance, e.g., the effectiveness of the antibody in treating cancer. For 

15 example, cysteine residue(s) can be introduced into the Fc region, thereby allowing interchain 
disulfide bond formation in this region. The homodimeric antibody thus generated can have 
improved internalization capability and/or increased complement-mediated cell killing and 
antibody-dependent cellular cytotoxicity (ADCC). See Caron et al., J. Exp Med., 1 76: 
1 191-1 195 (1992) and Shopes, J. Immunol, 148: 2918-2922 (1992). Homodimeric 

20 antibodies with enhanced anti-tumor activity can also be prepared using heterobifimctional 
cross-linkers as described in Wolff et al. Cancer Research, 53: 2560-2565 (1993). 
Alternatively, an antibody can be engineered that has dual Fc regions and can thereby have 
enhanced complement lysis and ADCC capabilities. See Stevenson et al, Anti-Cancer Drug 
Design, 3: 219-230 (1989). 

25 Immunoconjugatcs 

The invention also pertains to immunoconjugates comprising an antibody conjugated 
to a cytotoxic zfgent such as a chemotherapeutic agent, toxin (e.g., an enzymatically active 
toxin of bacterial, fungal, plant, or animal origin, or fragments thereof), or a radioactive 
isotope (i.e., a radioconjugate). 
30 Chemotherapeutic agents useful in the generation of such immunoconjugates have 

been described above. Enzymatically active toxins and fragments thereof that can be used 
include diphtheria A chain, nonbinding active fragments of diphtheria toxin, exotoxin A 
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chain (from Pseudomonas aeruginosa), ricin A chain, abrin A chain, modeccin A chain, 
alpha-sarcin, Aleurites fordii proteins, dianthin proteins, Phytolaca americana proteins (PAPI, 
PAPII, and PAP-S), momordica charantia inhibitor, curcin, crotin. sapaonaria officinalis 
inhibitor, gelonin, mitogellin, restrictocin, phenomycin, enomycin, and the tricothecenes. A 
5 variety of radionuclides are available for the production of radioconjugated antibodies. 
Examples include 2,2 Bi, ,3, 1, ,3l !n, ^Y, and ,86 Re. 

Conjugates of the antibody and cytotoxic agent are made using a variety of 
Afunctional protein-coupling agents such as N-succinimidyl-3-(2-pyridy!dithiol) propionate 
(SPDP), iminothiolane (IT), bifunctional derivatives of imidoesters (such as dimethyl 

1 0 adipimidate HCL), active esters (such as disuccinimidyl suberate), aldehydes (such as 
glutareldehyde), bis-azido compounds (such as bis (p-azidobenzoyl) hexanediamine)/ 
bis-diazonium derivatives (such as bis-(p-diazoniumbenzoyi)-ethylenediamine), 
diisocyanates (such as tolyene 2,6-diisocyanate), and bis-active fluorine compounds (such as 
l,5-difluoro-2,4-dinitrobenzene). For example, a ricin immunotoxin can be prepared as 

15 described in Vitetta et al., Science. 238 : 1098 (1987). Carbon- 1 4-labeled 

l-isothiocyanatobenzyl-3-methyldiethyIene triaminepentaacetic acid (MX-DTPA) is an 
exemplary chelating agent for conjugation of radionucleotide to the antibody. See 
WO94/I1026. 

In another embodiment, the antibody can be conjugated to a "receptor" (such 
20 streptavidin) for utilization in tumor pretargeting wherein the antibody-receptor conjugate is 
administered to the patient, followed by removal of unbound conjugate from the circulation 
using a clearing agent and then administration of a "ligand" (e.g., avidin) that is in turn 
conjugated to a cytotoxic agent. 

Immunoliposomcs 

25 The antibodies disclosed herein can also be formulated as immunoliposomes. 

Liposomes containing the antibody are prepared by methods known in the art, such as 
described in Epstein et al., Proc. Natl. Acad. Sci. USA, 82: 3688 (1985); Hwang et al., Proc. 
Natl Acad. Sci. USA, 77: 4030 (1980); and U.S. Pat. Nos. 4,485,045 and 4,544,545. 
Liposomes with enhanced circulation time are disclosed in U.S. Patent No. 5,013,556. 

30 Particularly useful liposomes can be generated by the reverse-phase evaporation 

method with a lipid composition comprising phosphatidylcholine, cholesterol, and 
PEG-derivatized phosphatidylethanolamine (PEG-PE). Liposomes are extruded through 
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filters of defined pore size to yield liposomes with the desired diameter. Fab' fragments of 
the antibody of the present invention can be conjugated to the liposomes as described in 
Martin et al .,_J. Biol. Chem., 257: 286-288 (1982) via a disuiflde-interchange reaction. A 
chemotherapeutic agent (such as Doxorubicin) is optionally contained within the liposome. 
5 See Gabizon etal., J. National Cancer Inst., 81(19): 1484 (1989). 

Diagnostic Applications of Antibodies Directed Against the Proteins of the 
Invention 

In one embodiment, methods for the screening of antibodies that possess the desired 
specificity include, but are not limited to, enzyme linked immunosorbent assay (ELISA) and 

10 other immunologically mediated techniques known within the art. In a specific embodiment, 
selection of antibodies that are specific to a particular domain of an NOVX protein is 
facilitated by generation of hybridomas that bind to the fragment of an NOVX protein 
possessing such a domain. Thus, antibodies that are specific for a desired domain within an 
NOVX protein, or derivatives, fragments, analogs or homologs thereof, are also provided 

15 herein. 

Antibodies directed against a NOVX protein of the invention may be used in methods 
known within the art relating to the localization and/or quantitation of a NOVX protein (e.g., 
for use in measuring levels of the NOVX protein within appropriate physiological samples, 
for use in diagnostic methods, for use in imaging the protein, and the like). In a given 
20 embodiment, antibodies specific to a NOVX protein, or derivative, fragment, analog or 
homolog thereof, that contain the antibody derived antigen binding domain, are utilized as 
pharmacologically active compounds (referred to hereinafter as "Therapeutics"). 

An antibody specific for a NOVX protein of the invention (e.g., a monoclonal 
antibody or a polyclonal antibody) can be used to isolate a NOVX polypeptide by standard 

25 techniques, such as immunoaffinity, chromatography or immunoprecipitation. An antibody 
to a NOVX polypeptide can facilitate the purification of a natural NOVX antigen from cells, 
or of a recombinantly produced NOVX antigen expressed in host cells. Moreover, such an 
anti-NOVX antibody can be used to detect the antigenic NOVX protein (e.g., in a cellular 
lysate or cell supernatant) in order to evaluate the abundance and pattern of expression of the 

30 antigenic NOVX protein. Antibodies directed against a NOVX protein can be used 

diagnostically to monitor protein levels in tissue as part of a clinical testing procedure, e.g., 
to, for example, determine the efficacy of a given treatment regimen. Detection can be 
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facilitated by coupling (i.e., physically linking) the antibody to a detectable substance. 
Examples of detectable substances include various enzymes, prosthetic groups, fluorescent ' 
materials, luminescent materials, bioluminescent materials, and radioactive materials. 
Examples of suitable enzymes include horseradish peroxidase, alkaline phosphatase, 
5 p-galactosidase, or acetylcholinesterase; examples of suitable prosthetic group complexes 
include streptavidin/biotin and avidin/biotin; examples of suitable fluorescent materials 
include umbelliferone, fluorescein, fluorescein isothiocyanate, rhodamine, 
dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin; an example of a 
luminescent material includes luminol; examples of bioluminescent materials include 
10 luciferase, luciferin, and aequorin, and examples of suitable radioactive material include ,25 L 



Antibody Therapeutics 

Antibodies of the invention, including polyclonal, monoclonal, humanized and fully 
human antibodies, may used as therapeutic agents. Such agents will generally be employed 

1 5 to treat or prevent a disease or pathology in a subject. An antibody preparation, preferably 
one having high specificity and high affinity for its target antigen, is administered to the 
subject and will generally have an effect due to its binding with the target. Such an effect 
may be one of two kinds, depending on the specific nature of the interaction between the 
given antibody molecule and the target antigen in question. In the first instance, 

20 administration of the antibody may abrogate or inhibit the binding of the target with an 

endogenous ligand to which it naturally binds. In this case, the antibody binds to the target 
and masks a binding site of the naturally occurring ligand, wherein the ligand serves as an 
effector molecule. Thus the receptor mediates a signal transduction pathway for which ligand 
is responsible. 

25 Alternatively, the effect may be one in which the antibody elicits a physiological 

result by virtue of binding to an effector binding site on the target molecule. In this case the 
target, a receptor having an endogenous ligand which may be absent or defective in the 
disease or pathology, binds the antibody as a surrogate effector ligand, initiating a 
receptor-based signal transduction event by the receptor. 

30 A therapeutically effective amount of an antibody of the invention relates generally to 

the amount needed to achieve a therapeutic objective. As noted above, this may be a binding 
interaction between the antibody and its target antigen that, in certain cases, interferes with 
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the functioning of the target, and in other cases, promotes a physiological response. The 
amount required to be administered will furthermore depend on the binding affinity of the 
antibody for its specific antigen, and will also depend on the rate at which an administered 
antibody is depleted from the free volume other subject to which it is administered. Common 
5 ranges for therapeutically effective dosing of an antibody or antibody fragment of the 
invention may be, by way of nonlimiting example, from about 0.1 mg/kg body weight to 
about 50 mg/kg body weight. Common dosing frequencies may range, for example, from 
twice daily to once a week. 

Pharmaceutical Compositions of Antibodies 

10 Antibodies specifically binding a protein of the invention, as well as other molecules 

identified by the screening assays disclosed herein, can be administered for the treatment of 
various disorders in the form of pharmaceutical compositions. Principles and considerations 
involved in preparing such compositions, as well as guidance in the choice of components are 
provided, for example, in Remington : The Science And Practice Of Pharmacy 19th ed. 

15 (Alfonso R. Gennaro, et al., editors) Mack Pub. Co., Easton, Pa. : 1995; Drug Absorption 
Enhancement : Concepts, Possibilities, Limitations, And Trends, Harvvood Academic 
Publishers, Langhorne, Pa., 1994; and Peptide And Protein Drug Delivery (Advances In 
Parenteral Sciences, Vol. 4), 1991, M. Dekker, New York. 

If the antigenic protein is intracellular and whole antibodies are used as inhibitors, 

20 internalizing antibodies are preferred. However, liposomes can also be used to deliver the 
antibody, or an antibody fragment, into cells. Where antibody fragments are used, the 
smallest inhibitory fragment that specifically binds to the binding domain of the target protein 
is preferred. For example, based upon the variable-region sequences of an antibody, peptide 
molecules can be designed that retain the ability to bind the target protein sequence. Such 

25 peptides can be synthesized chemically and/or produced by recombinant DNA technology. 
See, e.g., Marasco et al., Proc. Natl. Acad. Sci. USA, 90: 7889-7893 (1993). The 
formulation herein can also contain more than one active compound as necessary for the 
particular indication being treated, preferably those with complementary activities that do not 
adversely affect each other. Alternatively, or in addition, the composition can comprise an 

30 agent that enhances its function, such as, for example, a cytotoxic agent, cytokine, 

chemotherapeutic agent, or growth-inhibitory agent. Such molecules are suitably present in 
combination in amounts that are effective for the purpose intended. 
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The active ingredients can also be entrapped in microcapsules prepared, for example, 
by coacervation techniques or by interfacial polymerization, for example, 
hydroxymethylcellulose or gelatin-microcapsules and poly-(methylmethacrylate) 
microcapsules, respectively, in colloidal drug delivery systems (for example, liposomes, 
5 albumin microspheres, microemulsions, nano-particles, and nanocapsules) or in 
macroemulsions. 

The formulations to be used for in vivo administration must be sterile. This is readily 
accomplished by filtration through sterile filtration membranes. 

Sustained-release preparations can be prepared. Suitable examples of 

10 sustained-release preparations include semipermeable matrices of solid hydrophobic 

polymers containing the antibody, which matrices are in the form of shaped articles, e.g., 
films, or microcapsules. Examples of sustained-release matrices include polyesters, 
hydrogels (for example, po!y(2-hydroxyethyl-methacrylate), or poly(vinylalcohol)), 
polylactides (U.S. Pat. No. 3,773,919), copolymers of L-glutamic acid and y 

1 5 ethyl-L-glutamate, non-degradable ethylene-vinyl acetate, degradable lactic acid-glycolic 
acid copolymers such as the LUPRON DEPOT ™ (injectable microspheres composed of 
lactic acid-glycolic acid copolymer and leuprolide acetate), and poly-D-(-)-3-hydroxybutyric 
acid. While polymers such as ethylene-vinyl acetate and lactic acid-glycolic acid enable 
release of molecules for over 1 00 days, certain hydrogels release proteins for shorter time 

20 periods. 

ELISA Assay 

An agent for detecting an analyte protein is an antibody capable of binding to an 
analyte protein, preferably an antibody with a detectable label. Antibodies can be polyclonal, 
or more preferably, monoclonal. An intact antibody, or a fragment thereof (e.g., F ilb or F (ob)2 ) 

25 can be used. The term "labeled", with regard to the probe or antibody, is intended to 

encompass direct labeling of the probe or antibody by coupling (i.e., physically linking) a 
detectable substance to the probe or antibody, as well as indirect labeling of the probe or 
antibody by reactivity with another reagent that is directly labeled. Examples of indirect 
labeling include detection of a primary antibody using a fluorescently-labeled secondary 

30 antibody and end-labeling of a DNA probe with biotin such that it can be detected with 
fluorescently-labeled streptavidin. The term "biological sample" is intended to include 
tissues, cells and biological fluids isolated from a subject, as well as tissues, cells and fluids 
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present within a subject. Included within the usage of the term "biological sample", 
therefore, is blood and a fraction or component of blood including blood serum, blood 
plasma, or lymph. That is, the detection method of the invention can be used to detect an 
analyte mRNA, protein, or genomic DNA in a biological sample in vitro as well as in vivo. 
5 For example, in vitro techniques for detection of an analyte mRNA include Northern 
hybridizations and in situ hybridizations. In vitro techniques for detection of an analyte 
protein include enzyme linked immunosorbent assays (ELISAs), Western blots, 
immunoprecipitations, and immunofluorescence. In vitro techniques for detection of an 
analyte genomic DNA include Southern hybridizations. Procedures for conducting 

10 immunoassays are described, for example in "ELISA: Theory and Practice; Methods in 
Molecular Biology'', Vol. 42, J. R. Crowther (Ed.) Human Press, Totowa, NJ, 1995; 
"Immunoassay", E. Diamandis and T. Christopoulus, Academic Press, Inc., San Diego, CA, 
1996; and "Practice and Thory of Enzyme Immunoassays", P. Tijssen, Elsevier Science 
Publishers, Amsterdam, 1985. Furthermore, in vivo techniques for detection of an analyte 

1 5 protein include introducing into a subject a labeled anti-an analyte protein antibody. For 

example, the antibody can be labeled with a radioactive marker whose presence and location 
in a subject can be detected by standard imaging techniques. 

NOVX Recombinant Expression Vectors and Host Cells 

Another aspect of the invention pertains to vectors, preferably expression vectors, 
20 containing a nucleic acid encoding a NOVX protein, or derivatives, fragments, analogs or 

homologs thereof. As used herein, the term "vector" refers to a nucleic acid molecule capable 
of transporting another nucleic acid to which it has been linked. One type of vector is a . 
"plasmid", which refers to a circular double stranded DNA loop into which additional DNA 
segments can be ligated. Another type of vector is a viral vector, wherein additional DNA 
25 segments can be ligated into the viral genome. Certain vectors are capable of autonomous 
replication in a host cell into which they are introduced (e.g., bacterial vectors having a 
bacterial origin of replication and episomal mammalian vectors). Other vectors (e.g., 
non-episomal mammalian vectors) are integrated into the genome of a host cell upon 
introduction into the host cell, and thereby are replicated along with the host genome. 
30 Moreover, certain vectors are capable of directing the expression of genes to which they are 
operatively-l inked. Such vectors are referred to herein as "expression vectors". In general, 
expression vectors of utility in recombinant DNA techniques are often in the form of 
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plasmids. In the present specification, "plasmid" and "vector" can be used interchangeably as 
the plasmid is the most commonly used form of vector. However, the invention is intended 
to include such other forms of expression vectors, such as viral vectors (e.g., replication 
defective retroviruses, adenoviruses and adeno-associated viruses), which serve equivalent 
5 functions. 

The recombinant expression vectors of the invention comprise a nucleic acid of the 
invention in a form suitable for expression of the nucleic acid in a host cell, which means that 
the recombinant expression vectors include one or more regulatory sequences, selected on the 
basis of the host cells to be used for expression, that is operatively-Iinked to the nucleic acid 
10 sequence to be expressed. Within a recombinant expression vector, "operably-linked" is 
intended to mean that the nucleotide sequence of interest is linked to the regulatory .* 
sequence(s) in a manner that allows for expression of the nucleotide sequence (e.g., in an in 
vitro transcription/translation system or in a host cell when the vector is introduced into the 
host cell). 

15 The term "regulatory sequence" is intended to includes promoters, enhancers and 

other expression control elements (e.g, polyadenylation signals). Such regulatory sequences 
are described, for example, in Goeddel, Gene Expression Technology: Methods in 
ENZYMOLOGY 185, Academic Press, San Diego, Calif. (1990). Regulatory sequences include 
those that direct constitutive expression of a nucleotide sequence in many types of host cell 

20 and those that direct expression of the nucleotide sequence only in certain host cells (e.g., 

tissue-specific regulatory sequences). It will be appreciated by those skilled in the art that the 
design of the expression vector can depend on such factors as the choice of the host cell to be 
transformed', the level of expression of protein desired, etc. The expression vectors of the 
invention can be introduced into host cells to thereby produce proteins or peptides, including 

25 fusion proteins or peptides, encoded by nucleic acids as described herein (e.g., NOVX 
proteins, mutant forms of NOVX proteins, fusion proteins, etc.). 

The recombinant expression vectors of the invention can be designed for expression 
of NOVX proteins in prokaryotic or eukaryotic cells. For example, NOVX proteins can be 
expressed in bacterial cells such as Escherichia coh\ insect cells (using baculovirus 

30 expression vectors) yeast cells or mammalian cells. Suitable host cells are discussed further in 
Goeddel, Gene Expression Technology: Methods in Enzymology 185, Academic Press, 
San Diego, Calif. (1990). Alternatively, the recombinant expression vector can be 
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transcribed and translated in vitro, for example using T7 promoter regulatory sequences and 
T7 polymerase. 

Expression of proteins in prokaryotes is most often carried out in Escherichia coli 
with vectors containing constitutive or inducible promoters directing the expression of either 
5 fusion or non-fusion proteins. Fusion vectors add a number of amino acids to a protein 
encoded therein, usually to the amino terminus of the recombinant protein. Such fusion 
vectors typically serve three purposes: (i) to increase expression of recombinant protein; (/'/) 
to increase the solubility of the recombinant protein; and (///) to aid in the purification of the 
recombinant protein by acting as a ligand in affinity purification. Often, in fusion expression 

10 vectors, a proteolytic cleavage site is introduced at the junction of the fusion moiety and the 
recombinant protein to enable separation of the recombinant protein from the fusion moiety 
subsequent to purification of the fusion protein. Such enzymes, and their cognate recognition 
sequences, include Factor Xa, thrombin and enterokinase. Typical fusion expression vectors 
include pGEX (Pharmacia Biotech Inc; Smith and Johnson, 1988. Gene 67: 31-40), pMAL 

15 (New England Biolabs, Beverly, Mass.) and pRIT5 (Pharmacia, Piscataway, N.J.) that fuse 
glutathione S-transferase (GST), maltose E binding protein, or protein A, respectively, to the 
target recombinant protein. 

Examples of suitable inducible non-fusion E. coli expression vectors include pTrc 
(Amrann et aL, (1988) Gene 69:301-315) and pET 1 Id (Studier et aL, Gene Expression 

20 TECHNOLOGY: Methods in Enzymology 1 85, Academic Press, San Diego, Calif. (1 990) 
60-89). 

One strategy to maximize recombinant protein expression in E. coli is to express the 
protein in a host bacteria with an impaired capacity to proteolytically cleave the recombinant 
protein. See, e.g., Gottesman, Gene Expression Technology: Methods in Enzymology 

25 185, Academic Press, San Diego, Calif. (1990) 1 19-128. Another strategy is to alter the 
nucleic acid sequence of the nucleic acid to be inserted into an expression vector so that the 
individual codons for each amino acid are those preferentially utilized in £. coli (.see, e.g., 
Wada, et aL, 1 992. Nucl Acids Res. 20: 2 1 1 1 -2 1 1 8). Such alteration of nucleic acid 
sequences of the invention can be carried out by standard DNA synthesis techniques. 

30 In another embodiment, the NOVX expression vector is a yeast expression vector. 

Examples of vectors for expression in yeast Saccharomyces cerivisae include pYepSecl 
(Baldari, et aL, 1987. EMBOJ. 6: 229-234), pMFa (Kurjan and Herskowitz, 1982. Cell 30: 
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933-943), pJRY88 (Schultz et al, 1987. Gene 54: 1 13-123), pYES2 (Invitrogen Corporation, 
San Diego, Calif.), and picZ (InVitrogen Corp, San Diego, Calif.). 

Alternatively, NOVX can be expressed in insect cells using baculovirus expression 
vectors. Baculovirus vectors available for expression of proteins in cultured insect cells (e.g., 
5 SF9 cells) include the pAc series (Smith, et al, 1983. Mol Cell Biol. 3: 2156-2165) and the 
pVL series (Lucklowand Summers, 1989. Virology 170: 31-39). 

In yet another embodiment, a nucleic acid of the invention is expressed in mammalian 
cells using a mammalian expression vector. Examples of mammalian expression vectors 
include pCDM8 (Seed, 1987. Nature 329: 840) and pMT2PC (Kaufman, et al, 1987. EMBO 

10 J. 6: 1 87-1 95). When used in mammalian cells, the expression vector's control functions are 
often provided by viral regulatory elements. For example, commonly used promoters are 
derived from polyoma, adenovirus 2, cytomegalovirus, and simian virus 40. For other 
suitable expression systems for both prokaryotic and eukaryotic cells see, e.g., Chapters 1 6 
and 17 of Sambrook, et al, MOLECULAR CLONING: A LABORATORY Manual. 2nd ed., Cold 

15 Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 



In another embodiment, the recombinant mammalian expression vector is capable of 
directing expression of the nucleic acid preferentially in a particular cell type {e.g., 
tissue-specific regulatory elements are used to express the nucleic acid). Tissue-specific 

20 regulatory elements are known in the art. Non-limiting examples of suitable tissue-specific 
promoters include the albumin promoter (liver-specific; Pinkert, et al, 1987. Genes Dev. I : 
268-277), lymphoid-specific promoters (Calame and Eaton, 1988. Adv. Immunol. 43: 
235-275), in particular promoters of T cell receptors (Winoto and Baltimore, 1989. EMBO J. 
8: 729-733) and immunoglobulins (Banerji, et al., 1 983. Cell 33: 729-740; Queen and 

25 Baltimore, 1983. Cell 33: 741-748), neuron-specific promoters (e.g., the neurofilament 
promoter; Byrne and Ruddle, 1989. Proc. Natl. Acad. ScL USA 86: 5473-5477), 
pancreas-specific promoters (Edlund, et al., 1985. Science 230: 912-916), and mammary 
gland-specific promoters (e.g., milk whey promoter; U.S. Pat. No. 4,873,3 16 and European 
Application Publication No. 264,166). Developmentally-regulated promoters are also 

30 encompassed, e.g., the murine hox promoters (Kessel and Gruss, 1990. Science 249: 
374-379) and the ct-fetoprotein promoter (Campes and Tilghman, 1989. Genes Dev. 3: 
537-546). 
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The invention further provides a recombinant expression vector comprising a DNA 
molecule of the invention cloned into the expression vector in an antisense orientation. That 
is, the DNA molecule is operatively-linked to a regulatory sequence in a manner that allows 
for expression (by transcription of the DNA molecule) of an RNA molecule that is antisense 
5 to NOVX mRNA. Regulatory sequences operatively linked to a nucleic acid cloned in the 
antisense orientation can be chosen that direct the continuous expression of the antisense 
RNA molecule in a variety of cell types, for instance viral promoters and/or enhancers, or 
regulatory sequences can be chosen that direct constitutive, tissue specific or cell type 
specific expression of antisense RNA. The antisense expression vector can be in the form of 

1 0 a recombinant plasmid, phagemid or attenuated virus in which antisense nucleic acids are 

produced under the control of a high efficiency regulatory region, the activity of which can be 
determined by the cell type into which the vector is introduced. For a discussion of the 
regulation of gene expression using antisense genes see, e.g., Weintraub, et al., i; Antiscnse 
RNA as a molecular tool for genetic analysis," Reviews-Trends in Genetics, Vol. I ( 1 ) 1 986. 

1 5 Another aspect of the invention pertains to host cells into which a recombinant 

expression vector of the invention has been introduced. The terms "host cell" and 
"recombinant host cell" are used interchangeably herein. It is understood that such terms 
refer not only to the particular subject cell but also to the progeny or potential progeny of 
such a cell. Because certain modifications may occur in succeeding generations due to either 

20 mutation or environmental influences, such progeny may not, in fact, be identical to the 
parent cell, but are still included within the scope of the term as used herein. 

A host cell can be any prokaryotic or eukaryotic cell. For example, NOVX protein 
can be expressed in bacterial cells such as £. coli, insect cells, yeast or mammalian cells 
(such as Chinese hamster ovary cells (CHO) or COS cells). Other suitable host cells are 

25 known to those skilled in the art. 

Vector DNA can be introduced into prokaryotic or eukaryotic cells via conventional 
transformation or transfection techniques. As used herein, the terms "transformation" and 
"transfection" are intended to refer to a variety of art-recognized techniques for introducing 
foreign nucleic acid (e.g., DNA) into a host cell, including calcium phosphate or calcium 

30 chloride co-precipitation, DEAE-dextran-mediated transfection, lipofection, or 

electroporation. Suitable methods for transforming or transfecting host cells can be found in 
Sambrook, et al. (Molecular Cloning: A Laboratory Manual. 2nd ed., Cold Spring 
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Harbor Laboratory, Cold Spring Harbor Laboratory Press. Cold Spring Harbor, N.Y., 1989), 
and other laboratory manuals. 

For stable transfection of mammalian cells, it is known that, depending upon the 
expression vector and transfection technique used, only a small fraction of cells may integrate 
5 the foreign DNA into their genome. In order to identify and select these integrants, a gene 
that encodes a selectable marker (e.g., resistance to antibiotics) is generally introduced into 
the host cells along with the gene of interest. Various selectable markers include those that 
confer resistance to drugs, such as G41 8, hygromycin and methotrexate. Nucleic acid 
encoding a selectable marker can be introduced into a host cell on the same vector as that 

10 encoding NOVX or can be introduced on a separate vector. Cells stably transfected with the 
introduced nucleic acid can be identified by drug selection (e.g., cells that have incorporated 
the selectable marker gene will survive, while the other ceils die). 

A host cell of the invention, such as a prokaryotic or eukaryotic host cell in culture, 
can be used to produce (i.e., express) NOVX protein. Accordingly, the invention further 

1 5 provides methods for producing NOVX protein using the host cells of the invention. In one 
embodiment, the method comprises culturing the host cell of invention (into which a 
recombinant expression vector encoding NOVX protein has been introduced) in a suitable 
medium such that NOVX protein is produced. In another embodiment, the method further 
comprises isolating NOVX protein from the medium or the host cell. 

20 Transgenic NOVX Animals 

The host cells of the invention can also be used to produce non-human transgenic 
animals. For example, in one embodiment, a host cell of the invention is a fertilized oocyte 
or an embryonic stem cell into which NOVX protein-coding sequences have been introduced. 
Such host cells can then be used to create non-human transgenic animals in which exogenous 

25 NOVX sequences have been introduced into their genome or homologous recombinant 

animals in which endogenous NOVX sequences have been altered. Such animals are useful 
for studying the function and/or activity of NOVX protein and for identifying and/or 
evaluating modulators of NOVX protein activity. As used herein, a "transgenic animal" is a 
non-human animal, preferably a mammal, more preferably a rodent such as a rat or mouse, in 

30 which one or more of the cells of the animal includes a transgene. Other examples of 
transgenic animals include non-human primates, sheep, dogs, cows, goats, chickens, 
amphibians, etc. A transgene is exogenous DNA that is integrated into the genome of a cell 
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from which a transgenic animal develops and that remains in the genome of the mature 
animal, thereby directing the expression of an encoded gene product in one or more cell types 
or tissues of the transgenic animal. As used herein, a "homologous recombinant animal" is a 
non-human animal, preferably a mammal, more preferably a mouse, in which an endogenous 
5 NOVX gene has been altered by homologous recombination between the endogenous gene 
and an exogenous DNA molecule introduced into a cell of the animal, e.g., an embryonic cell 
of the animal, prior to development of the animal. 

A transgenic animal of the invention can be created by introducing NOVX-encoding 
nucleic acid into the male pronuclei of a fertilized oocyte (e.g., by microinjection, retroviral 

10 infection) and allowing the oocyte to develop in a pseudopregnant female foster animal. The 
human NOVX cDNA sequences, i.e., any one of SEQ ID NO:2/7-l, wherein n is an integer 
between 1 and 1 72, can be introduced as a transgene into the genome of a non-human animal. 
Alternatively, a non-human homologue of the human NOVX gene, such as a mouse NOVX 
gene, can be isolated based on hybridization to the human NOVX cDNA (described further 

15 supra) and used as a transgene. Intronic sequences and polyadenylation signals can also be 
included in the transgene to increase the efficiency of expression of the transgene. A 
tissue-specific regulatory sequence(s) can be operably-l inked to the NOVX transgene to 
direct expression of NOVX protein to particular cells. Methods for generating transgenic 
animals via embryo manipulation and microinjection, particularly animals such as mice, have 

20 become conventional in the art and are described, for example, in U.S. Patent Nos. 4,736,866; 
4,870,009; and 4,873,191; and Hogan, 1986. In: Manipulating the Mouse Embryo, Cold 
Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. Similar methods are used for 
production of other transgenic animals. A transgenic founder animal can be identified based 
upon the presence of the NOVX transgene in its genome and/or expression of NOVX mRNA 

25 in tissues or cells of the animals. A transgenic founder animal can then be used to breed 
additional animals carrying the transgene. Moreover, transgenic animals carrying a 
transgene-encoding NOVX protein can further be bred to other transgenic animals carrying 
other transgenes. 

To create a homologous recombinant animal, a vector is prepared which contains at 
30 least a portion of a NOVX gene into which a deletion, addition or substitution has been 

introduced to thereby alter, e.g., functionally disrupt, the NOVX gene. The NOVX gene can 
be a human gene (e.g., the cDNA of any one of SEQ ID NO:2/7-l, wherein n is an integer 
between I and 1 72), but more preferably, is a non-human homologue of a human NOVX 
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gene. For example, a mouse homologue of human NOVX gene of SEQ ID NO:2//-l, wherein 
n is an integer between I and 172, can be used to construct a homologous recombination 
vector suitable for altering an endogenous NOVX gene in the mouse genome. In one 
embodiment, the vector is designed such that, upon homologous recombination, the 
5 endogenous NOVX gene is functionally disrupted (i.e., no longer encodes a functional 
protein; also referred to as a "knock out" vector). 

Alternatively, the vector can be designed such that, upon homologous recombination, 
the endogenous NOVX gene is mutated or otherwise altered but still encodes functional 
protein (e.g., the upstream regulatory region can be altered to thereby alter the expression of 

10 the endogenous NOVX protein). In the homologous recombination vector, the altered 

portion of the NOVX gene is flanked at its 5'- and 3'-termini by additional nucleic acicl of the 
NOVX gene to allow for homologous recombination to occur between the exogenous NOVX 
gene carried by the vector and an endogenous NOVX gene in an embryonic stem cell. The 
additional flanking NOVX nucleic acid is of sufficient length for successful homologous 

1 5 recombination with the endogenous gene. Typically, several kilobases of flanking DNA 

(both at the 5'- and 3'-termini) are included in the vector. See, e.g., Thomas, et aL, 1 987. Cell 
51 : 503 for a description of homologous recombination vectors. The vector is ten introduced 
into an embryonic stem cell line (e.g., by electroporation) and cells in which ihe introduced 
NOVX gene has homologously-recombined with the endogenous NOVX gene are selected. 

20 See, e.g., Li, eta/., 1992. Cell 69: 915. 

The selected cells are then injected into a blastocyst of an animal (e.g., a mouse) to 
form aggregation chimeras. See, e.g., Bradley, 1987. In: TERATOCARCINOMAS and 
Embryonic Stem Cells: A Practical Approach, Robertson, ed. IRL, Oxford, pp. 
1 13-152. A chimeric embryo can then be implanted into a suitable pseudopregnant female 

25 foster animal and the embryo brought to term. Progeny harboring the 

homologously-recombined DNA in their germ cells can be used to breed animals in which all 
cells of the animal contain the homologously-recombined DNA by germline transmission of 
the transgene. Methods for constructing homologous recombination vectors and homologous 
recombinant animals are described further in Bradley, 1991. Citrr. Opin. Biotechnol. 2: 

30 823-829; PCT International Publication Nos.: WO 90/1 1354; WO 91/01 140; WO 92/0968: 
and WO 93/04169. 

In another embodiment, transgenic non-humans animals can be produced that contain 
selected systems that allow for regulated expression of the transgene. One example of such a 
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system is the cre/loxP recombinase system of bacteriophage PI . For a description of the 
cre/loxP recombinase system, See, e.g., Lakso, et al, 1992. Proc. Natl. Acad. Sci. USA 89: 
6232-6236. Another example of a recombinase system is the FLP recombinase system of 
Saccharomyces cerevisiae. See, O'Gorman, et al., 1 99 1 . Science 251:135 1-1355. I f a 
5 cre/loxP recombinase system is used to regulate expression of the transgene, animals 

containing transgenes encoding both the Cre recombinase and a selected protein are required. 
Such animals can be provided through the construction of "double" transgenic animals, e.g., 
by mating two transgenic animals, one containing a transgene encoding a selected protein and 
the other containing a transgene encoding a recombinase. 

10 Clones of the non-human transgenic animals described herein can also be produced 

according to the methods described in Wilmut, et a/., 1997. Nature 385: 810-813. In brief, a 
ceil (e.g., a somatic cell) from the transgenic animal can be isolated and induced to exit the 
growth cycle and enter Go phase. The quiescent cell can then be fused, e.g., through the use 
of electrical pulses, to an enucleated oocyte from an animal of the same species from which 

15 the quiescent cell is isolated. The reconstructed oocyte is then cultured such that it develops 
to morula or blastocyte and then transferred to pseudopregnant female foster animal. The 
offspring borne of this female foster animal will be a clone of the animal from which the cell 
(e.g., the somatic cell) is isolated. 

Pharmaceutical Compositions 

20 The NOVX nucleic acid molecules, NOVX proteins, and anti-NOVX antibodies (also 

referred to herein as "active compounds") of the invention, and derivatives, fragments, 
analogs and homologs thereof, can be incorporated into pharmaceutical compositions suitable 
for administration. Such compositions typically comprise the nucleic acid molecule, protein, 
or antibody and a pharmaceutical^ acceptable carrier. As used herein, "pharmaceutically 

25 acceptable carrier" is intended to include any and all solvents, dispersion media, coatings, 
antibacterial and antifungal agents, isotonic and absorption delaying agents, and the like, 
compatible with pharmaceutical administration. Suitable carriers are described in the most 
recent edition of Remington's Pharmaceutical Sciences, a standard reference text in the field, 
which is incorporated herein by reference. Preferred examples of such carriers or diluents 

30 include, but are not limited to, water, saline, fingers solutions, dextrose solution, and 5% 
human serum albumin. Liposomes and non-aqueous vehicles such as fixed oils may also be 
used. The use of such media and agents for pharmaceutically active substances is well 
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known in the art. Except insofar as any conventional media or agent is incompatible with the 
active compound, use thereof in the compositions is contemplated. Supplementary active ' 
compounds can also be incorporated into the compositions. 

A pharmaceutical composition of the invention is formulated to be compatible with its 
5 intended route of administration. Examples of routes of administration include parenteral, 
e.g., intravenous, intradermal, subcutaneous, oral (e.g., inhalation), transdermal (i.e., topical), 
transmucosal, and rectal administration. Solutions or suspensions used for parenteral, 
intradermal, or subcutaneous application can include the following components: a sterile 
diluent such as water for injection, saline solution, fixed oils, polyethylene glycols, glycerine, 

10 propylene glycol or other synthetic solvents; antibacterial agents such as benzyl alcohol or 
methyl parabens; antioxidants such as ascorbic acid or sodium bisulfite; chelating agents such 
as ethylenediaminetetraacetic acid (EDTA); buffers such as acetates, citrates or phosphates, 
and agents for the adjustment of tonicity such as sodium chloride or dextrose. The pH can be 
adjusted with acids or bases, such as hydrochloric acid or sodium hydroxide. The parenteral 

1 5 preparation can be enclosed in ampoules, disposable syringes or multiple dose vials made of 
glass or plastic. 

Pharmaceutical compositions suitable for injectable use include sterile aqueous 
solutions (where water soluble) or dispersions and sterile powders for the extemporaneous 
preparation of sterile injectable solutions or dispersion. For intravenous administration, 

20 suitable carriers include physiological saline, bacteriostatic water, Cremophor EL™ (BASF, 
Parsippany, N.J.) or phosphate buffered saline (PBS). In all cases, the composition must be 
sterile and should be fluid to the extent that easy syringeability exists. It must be stable under 
the conditions of manufacture and storage and must be preserved against the contaminating 
action of microorganisms such as bacteria and fungi. The carrier can be a solvent or 

25 dispersion medium containing, for example, water, ethanol, polyol (for example, glycerol, 
propylene glycol, and liquid polyethylene glycol, and the like), and suitable mixtures thereof. 
The proper fluidity can be maintained, for example, by the use of a coating such as lecithin, 
by the maintenance of the required particle size in the case of dispersion and by the use of 
surfactants. Prevention of the action of microorganisms can be achieved by various 

30 antibacterial and antifungal agents, for example, parabens, chlorobutanol, phenol, ascorbic 
acid, thimerosal, and the like. In many cases, it will be preferable to include isotonic agents, 
for example, sugars, polyalcohols such as manitol, sorbitol, sodium chloride in the 
composition. Prolonged absorption of the injectable compositions can be brought about by 
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including in the composition an agent which delays absorption, for example, aluminum 
monostearate and gelatin. 

Sterile injectable solutions can be prepared by incorporating the active compound 
(e.g., a NOVX protein or anti-NOVX antibody) in the required amount in an appropriate 
5 solvent with one or a combination of ingredients enumerated above, as required, followed by 
filtered sterilization. Generally, dispersions are prepared by incorporating the active 
compound into a sterile vehicle that contains a basic dispersion medium and the required 
other ingredients from those enumerated above. In the case of sterile powders for the 
preparation of sterile injectable solutions, methods of preparation are vacuum drying and 

1 0 freeze-drying that yields a powder of the active ingredient plus any additional desired 
ingredient from a previously sterile-filtered solution thereof. 

Oral compositions generally include an inert diluent or an edible carrier. They can be 
enclosed in gelatin capsules or compressed into tablets. For the purpose of oral therapeutic 
administration, the active compound can be incorporated with excipients and used in tile form 

1 5 of tablets, troches, or capsules. Oral compositions can also be prepared using a fluid carrier 
for use as a mouthwash, wherein the compound in the fluid carrier is applied orally and 
swished and expectorated or swallowed. Pharmaceutical ly compatible binding agents, and/or 
adjuvant materials can be included as part of the composition. The tablets, pills, capsules, 
troches and the like can contain any of the following ingredients, or compounds of a similar 

20 nature: a binder such as microcrystalline cellulose, gum tragacanth or gelatin; an excipient 
such as starch or lactose, a disintegrating agent such as alginic acid, Primogel, or corn starch; 
a lubricant such as magnesium stearate or Sterotes; a glidant such as colloidal silicon dioxide; 
a sweetening agent such as sucrose or saccharin; or a flavoring agent such as peppermint, 
methyl salicylate, or orange flavoring. 

25 For administration by inhalation, the compounds are delivered in the form of an 

aerosol spray from pressured container or dispenser which contains a suitable propellant, e.g., 
a gas such as carbon dioxide, or a nebulizer. 

Systemic administration can also be by transmucosal or transdermal means. For 
transmucosal or transdermal administration, penetrants appropriate to the barrier to be 

30 permeated are used in the formulation. Such penetrants are generally known in the art, and 
include, for example, for transmucosal administration, detergents, bile salts, and fusidic acid 
derivatives. Transmucosal administration can be accomplished through the use of nasal 
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sprays or suppositories. For transdermal administration, the active compounds are 
formulated into ointments, salves, gels, or creams as generally known in the art. 

The compounds can also be prepared in the form of suppositories (e.g., with 
conventional suppository bases such as cocoa butter and other glycerides) or retention 
5 enemas for rectal delivery. 

In one embodiment, the active compounds are prepared with carriers that will protect 
the compound against rapid elimination from the body, such as a controlled release 
formulation, including implants and microencapsulated delivery systems. Biodegradable, 
biocompatible polymers can be used, such as ethylene vinyl acetate, polyanhydrides, 

10 polyglycolic acid, collagen, polyorthoesters, and polylactic acid. Methods for preparation of 
such formulations will be apparent to those skilled in the art. The materials can also be 
obtained commercially from Alza Corporation and Nova Pharmaceuticals, Inc. Liposomal 
suspensions (including liposomes targeted to infected cells with monoclonal antibodies to 
viral antigens) can also be used as pharmaceutically acceptable carriers. These can be 

1 5 prepared according to methods known to those skilled in the art, for example, as described in 
U.S. Patent No. 4,522,811. 

It is especially advantageous to formulate oral or parenteral compositions in dosage 
unit form for ease of administration and uniformity of dosage. Dosage unit form as used 
herein refers to physically discrete units suited as unitary dosages for the subject to be 

20 treated; each unit containing a predetermined quantity of active compound calculated to 

produce the desired therapeutic effect in association with the required pharmaceutical carrier. 
The specification for the dosage unit forms of the invention are dictated by and directly 
dependent on the unique characteristics of the active compound and the particular therapeutic 
effect to be achieved, and the limitations inherent in the art of compounding such an active 

25 compound for the treatment of individuals. 

The nucleic acid molecules of the invention can be inserted into vectors and used as 
gene therapy vectors. Gene therapy vectors can be delivered to a subject by, for example, 
intravenous injection, local administration (see, e.g., U.S. Patent No. 5,328,470) or by 
stereotactic injection (see, e.g., Chen, et aL, 1994. Proc. Natl. Acad Sci. USA 91 : 

30 3054-3057). The pharmaceutical preparation of the gene therapy vector can include the gene 
therapy vector in an acceptable diluent, or can comprise a slow release matrix in which the 
gene delivery vehicle is imbedded. Alternatively, where the complete gene delivery vector 
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can be produced intact from recombinant cells, e.g., retroviral vectors, the pharmaceutical 
preparation can include one or more cells that produce the gene delivery system. 

The pharmaceutical compositions can be included in a container, pack, or dispenser 
together with instructions for administration. 

5 Screening and Detection Methods 

The isolated nucleic acid molecules of the invention can be used to express NOVX 
protein (e.g., via a recombinant expression vector in a host cell in gene therapy applications), 
to detect NOVX mRNA (e.g., in a biological sample) or a genetic lesion in a NOVX gene, 
and to modulate NOVX activity, as described further, below. In addition, the NOVX proteins 

10 can be used to screen drugs or compounds that modulate the NOVX protein activity or 

expression as well as to treat disorders characterized by insufficient or excessive production 
of NOVX protein or production of NOVX protein forms that have decreased or aberrant 
activity compared to NOVX wild-type protein (e.g.; diabetes (regulates insulin release); 
obesity (binds and transport lipids); metabolic disturbances associated with obesity, the 

15 metabolic syndrome X as well as anorexia and wasting disorders associated with chronic 

diseases and various cancers, and infectious disease(possesses anti-microbial activity) and the 
various dyslipidemias. In addition, the anti-NOVX antibodies of the invention can be used to 
detect and isolate NOVX proteins and modulate NOVX activity. In yet a further aspect, the 
invention can be used in methods to influence appetite, absorption of nutrients and the 

20 disposition of metabolic substrates in both a positive and negative fashion. 

The invention further pertains to novel agents identified by the screening assays 
described herein and uses thereof for treatments as described, supra. 

Screening Assays 

The invention provides a method (also referred to herein as a "screening assay") for 
25 identifying modulators, i.e., candidate or test compounds or agents (e.g., peptides, 

peptidomimetics, small molecules or other drugs) that bind to NOVX proteins or have a 
stimulatory or inhibitory effect on, e.g., NOVX protein expression or NOVX protein activity. 
The invention also includes compounds identified in the screening assays described herein. 
In one embodiment, the invention provides assays for screening candidate or test 
30 compounds which bind to or modulate the activity of the membrane-bound form of a NOVX 
protein or polypeptide or biologically-active portion thereof. The test compounds of the 
invention can be obtained using any of the numerous approaches in combinatorial library 

76 



WO 03/023002 




PCT/US02/28539 



methods known in the art, including: biological libraries; spatially addressable parallel solid 
phase or solution phase libraries; synthetic library methods requiring deconvolution; the 
u one-bead one-compound" library method; and synthetic library methods using affinity 
chromatography selection. The biological library approach is limited to peptide libraries, 
5 while the other four approaches are applicable to peptide, non-peptide oligomer or small 
molecule libraries of compounds. See, e.g., Lam, 1997 Anticancer Drug Design 12: 145. 

A "small molecule" as used herein, is meant to refer to a composition that has a 
molecular weight of less than about 5 kD and most preferably less than about 4 kD. Small 
molecules can be, e.g., nucleic acids, peptides, polypeptides, peptidomimetics, carbohydrates, 

10 lipids or other organic or inorganic molecules. Libraries of chemical and/or biological 

mixtures, such as fungal, bacterial, or algal extracts, are known in the art and can be screened 
with any of the assays of the invention. 

Examples of methods for the synthesis of molecular libraries can be found in the art, 
for example in: DeWitt, et al., 1 993. Proc. Natl. Acad. Sci. U.S.A. 90: 6909; Erb, el al, 1 994. 

15 Proc. Natl. Acad. Sci U.S.A. 91: 1 1422; Zuckermann, et al., 1994. J. Med. Chem. 37: 2678; 
Cho, et al. 9 1993. Science 261: 1303; Carrell, et al. 9 1994. Angew. Chem. Int. Ed Engl. 33: 
2059; Carell, et al., 1994. Angew. Chem. Int. Ed. Engl 33: 2061 ; and Gallop, et a/., 1994. J. 
Med. Chem. 37: 1233. 

Libraries of compounds may be presented in solution (e.g., Houghten, 1992. 

20 Biotechniques 13: 412-421), or on beads (Lam, 1991 . Nature 354: 82-84), on chips (fodor, 
1993. Nature 364: 555-556), bacteria (Ladner, U.S. Patent No. 5,223,409), spores (Ladner, 
U.S. Patent 5,233,409), plasmids (Cull, et al, 1 992. Proc, Natl Acad. Sci. USA 89: 
1865-1 869) or on phage (Scott and Smith, 1990. Science 249: 386-390; Devlin, 1990. Science 
249:404-406; Cvvirla, et al., 1990. Proc. Natl Acad. Sci. U.S.A. 87: 6378-6382; Felici, 1991. 

25 J. Mol Biol. 222: 301-310; Ladner, U.S. Patent No. 5,233,409.). 

In one embodiment, an assay is a cell-based assay in which a cell which expresses a 
membrane-bound form of NOVX protein, or a biologically-active portion thereof, on the cell 
surface is contacted with a test compound and the ability of the test compound to bind to a 
NOVX protein determined. The cell, for example, can of mammalian origin or a yeast cell. 

30 Determining the ability of the test compound to bind to the NOVX protein can be 

accomplished, for example, by coupling the test compound with a radioisotope or enzymatic 
label such that binding of the test compound to the NOVX protein or biologically-active 
portion thereof can be determined by detecting the labeled compound in a complex. For 
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example, test compounds can be labeled with l25 I, 35 S, l4 C, or 3 H, either directly or indirectly, 
and the radioisotope detected by direct counting of radioemission or by scintillation counting. 
Alternatively, test compounds can be enzymatically-labeled with, for example, horseradish 
peroxidase, alkaline phosphatase, or luciferase, and the enzymatic label detected by 
5 determination of conversion of an appropriate substrate to product. In one embodiment, the 
assay comprises contacting a cell which expresses a membrane-bound form of NOVX 
protein, or a biologically-active portion thereof, on the cell surface with a known compound 
which binds NOVX to form an assay mixture, contacting the assay mixture with a test 
compound, and determining the ability of the test compound to interact with a NOVX 

10 protein, wherein determining the ability of the test compound to interact with a NOVX 
protein comprises determining the ability of the test compound to preferentially bind to 
NOVX protein or a biologically-active portion thereof as compared to the known compound. 

In another embodiment, an assay is a cell-based assay comprising contacting a cell 
expressing a membrane-bound form of NOVX protein, or a biologically-active portion 

15 thereof, on the cell surface with a test compound and determining the ability of the test 
compound to modulate (e.g., stimulate or inhibit) the activity of the NOVX protein or 
biologically-active portion thereof. Determining the ability of the test compound to modulate 
the activity of NOVX or a biologically-active portion thereof can be accomplished, for 
example, by determining the ability of the NOVX protein to bind to or interact with a NOVX 

20 target molecule. As used herein, a "target molecule" is a molecule with which a NOVX 
protein binds or interacts in nature, for example, a molecule on the surface of a cell which 
expresses a NOVX interacting protein, a molecule on the surface of a second cell, a molecule 
in the extracellular milieu, a molecule associated with the internal surface of a cell membrane 
or a cytoplasmic molecule. A NOVX target molecule can be a non-NOVX molecule or a 

25 NOVX protein or polypeptide of the invention. In one embodiment, a NOVX target 

molecule is a component of a signal transduction pathway that facilitates transduction of an 
extracellular signal (e.g. a signal generated by binding of a compound to a membrane-bound 
NOVX molecule) through the cell membrane and into the cell. The target, for example, can 
be a second intercellular protein that has catalytic activity or a protein that facilitates the 

30 association of downstream signaling molecules with NOVX. 

Determining the ability of the NOVX protein to bind to or interact with a NOVX 
target molecule can be accomplished by one of the methods described above for determining 
direct binding. In one embodiment, determining the ability of the NOVX protein to bind to or 
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interact with a NOVX target molecule can be accomplished by determining the activity of the 
target molecule. For example, the activity of the target molecule can be determined by 
detecting induction of a cellular second messenger of the target (i.e. intracellular Ca 2+ , 
diacylglycerol, IP 3 , etc.), detecting catalytic/enzymatic activity of the target an appropriate 
5 substrate, detecting the induction of a reporter gene (comprising a NOVX-responsive 
regulatory element operatively linked to a nucleic acid encoding a detectable marker, e.g., 
luciferase), or detecting a cellular response, for example, cell survival, cellular differentiation, 
or cell proliferation. 

In yet another embodiment, an assay of the invention is a cell-free assay comprising 

1 0 contacting a NOVX protein or biologically-active portion thereof with a test compound and 
determining the ability of the test compound to bind to the NOVX protein or 
biologically-active portion thereof. Binding of the test compound to the NOVX protein can 
be determined either directly or indirectly as described above. In one such embodiment, the 
assay comprises contacting the NOVX protein or biologically-active portion thereof with a 

1 5 known compound which binds NOVX to form an assay mixture, contacting the assay mixture 
with a test compound, and determining the ability of the test compound to interact with a 
NOVX protein, wherein determining the ability of the test compound to interact with a 
NOVX protein comprises determining the ability of the test compound to preferentially bind 
to NOVX or biologically-active portion thereof as compared to the known compound. 

20 In still another embodiment, an assay is a cell-free assay comprising contacting 

NOVX protein or biologically-active portion thereof with a test compound and determining 
the ability of the test compound to modulate (e.g. stimulate or inhibit) the activity of the 
NOVX protein or biologically-active portion thereof. Determining the ability of the test 
compound to modulate the activity of NOVX can be accomplished, for example, by 

25 determining the ability of the NOVX protein to bind to a NOVX target molecule by one of 
the methods described above for determining direct binding. In an alternative embodiment, 
determining the ability of the test compound to modulate the activity of NOVX protein can 
be accomplished by determining the ability of the NOVX protein further modulate a NOVX 
target molecule. For example, the catalytic/enzymatic activity of the target molecule on an 

30 appropriate substrate can be determined as described, supra. 

In yet another embodiment, the cell-free assay comprises contacting the NOVX 
protein or biologically-active portion thereof with a known compound which binds NOVX 
protein to form an assay mixture, contacting the assay mixture with a test compound, and 
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determining the ability of the test compound to interact with a NOVX protein, wherein 
determining the ability of the test compound to interact with a NOVX protein comprises 
determining the ability of the NOVX protein to preferentially bind to or modulate the activity * 
of a NOVX target molecule. 
5 The cell-free assays of the invention are amenable to use of both the soluble form or 

the membrane-bound form of NOVX protein. In the case of cell-free assays comprising the 
membrane-bound form of NOVX protein, it may be desirable to utilize a solubilizing agent 
such that the membrane-bound form of NOVX protein is maintained in solution. Examples 
of such solubilizing agents include non-ionic detergents such as n-octylglucoside, 

10 n-dodecylglucoside, n-dodecylmaltoside, octanoyl-N-methylglucamide, 
decanoyl-N-methylglucamide, Triton® X-100, Triton® X-1 14, Thesit®, 
Lsotridecypoly(ethylene glycol ether) n , N-dodecyl--N,N-dimethyl-3-ammonio-l -propane 
sulfonate, 3-(3-cholamidopropyl) dimethylamminiol-I -propane sulfonate (CHAPS), or 
3-(3-cholamidopropyl)dimethylamminiol-2-hydroxy-l -propane sulfonate (CHAPSO).' 0 

15 In more than one embodiment of the above assay methods of the invention, it may be 

desirable to immobilize either NOVX protein or its target molecule to facilitate separation of 
complexed from uncomplexed forms of one or both of the proteins, as well as to 
accommodate automation of the assay. Binding of a test compound to NOVX protein, or 
interaction of NOVX protein with a target molecule in the presence and absence of a 

20 candidate compound, can be accomplished in any vessel suitable for containing the reactants. 
Examples of such vessels include microtiter plates, test tubes, and micro-centrifuge tubes. In 
one embodiment, a fusion protein can be provided that adds a domain that allows one or both 
of the proteins to be bound to a matrix. For example, GST-NOVX fusion proteins or 
GST-target fusion proteins can be adsorbed onto glutathione sepharose beads (Sigma 

25 Chemical, St. Louis, MO) or glutathione derivatized microtiter plates, that are then combined 
with the test compound or the test compound and either the non-adsorbed target protein or 
NOVX protein, and the mixture is incubated under conditions conducive to complex 
formation (e.g., at physiological conditions for salt and pH). Following incubation, the beads 
or microtiter plate wells are washed to remove any unbound components, the matrix 

30 immobilized in the case of beads, complex determined either directly or indirectly, for 
example, as described, supra. Alternatively, the complexes can be dissociated from the 
matrix, and the level of NOVX protein binding or activity determined using standard 
techniques. 
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Other techniques for immobilizing proteins on matrices can also be used in the 
screening assays of the invention. For example, either the NOVX protein or its target 
molecule can be immobilized utilizing conjugation of biotin and streptavidin. Biotinylatcd 
NOVX protein or target molecules can be prepared from biotin-NHS 
5 (N-hydroxy-succinimide) using techniques well-known within the art (e.g., biotinylation kit, 
Pierce Chemicals, Rockford, 111.), and immobilized in the wells of streptavidin-coated 96 well 
plates (Pierce Chemical). Alternatively, antibodies reactive with NOVX protein or target 
molecules, but which do not interfere with binding of the NOVX protein to its target 
molecule, can be derivatized to the wells of the plate, and unbound target or NOVX protein 

10 trapped in the wells by antibody conjugation. Methods for detecting such complexes, in 
addition to those described above for the GST-immobilized complexes, include 
immunodetection of complexes using antibodies reactive with the NOVX protein or target 
molecule, as well as enzyme-linked assays that rely on detecting an enzymatic activity 
associated with the NOVX protein or target molecule. 

1 5 In another embodiment, modulators of NOVX protein expression are identified in a 

method wherein a cell is contacted with a candidate compound and the expression of NOVX 
mRNA or protein in the cell is determined. The level of expression of NOVX mRNA or 
protein in the presence of the candidate compound is compared to the level of expression of 
NOVX mRNA or protein in the absence of the candidate compound. The candidate 

20 compound can then be identified as a modulator of NOVX mRNA or protein expression 
based upon this comparison. For example, when expression of NOVX mRNA or protein is 
greater (i.e., statistically significantly greater) in the presence of the candidate compound than 
in its absence, the candidate compound is identified as a stimulator of NOVX mRNA or 
protein expression.. Alternatively, when expression of NOVX mRNA or protein is less 

25 (statistically significantly less) in the presence of the candidate compound than in its absence, 
the candidate compound is identified as an inhibitor of NOVX mRNA or protein expression. 
The level of NOVX mRNA or protein expression in the cells can be determined by methods 
described herein for detecting NOVX mRNA or protein. 

In yet another aspect of the invention, the NOVX proteins can be used as "bait 

30 proteins" in a two-hybrid assay or three hybrid assay (see, e.g., U.S. Patent No. 5,283,3 1 7; 
Zervos, el al. % 1993. Ce/172: 223-232; Madura, etal., 1993. J. Biol. Chem. 268: 
12046-12054; Bartel, et al„ 1993. Biotechmques 14:920-924; Iwabuchi, et a/., 1993. 
Oncogene 8: 1693-1696; and Brent WO 94/10300), to identify other proteins that bind to or 
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interact with NOVX ("NOVX-binding proteins" or "NOVX-bp") and modulate NOVX 
activity. Such NOVX-binding proteins are also involved in the propagation of signals by the 
NOVX proteins as, for example, upstream or downstream elements of the NOVX pathway. 
The two-hybrid system is based on the modular nature of most transcription factors, 
5 which consist of separable DNA-binding and activation domains. Briefly, the assay utilizes 
two different DNA constructs. In one construct, the gene that codes for NOVX is fused to a 
gene encoding the DNA binding domain of a known transcription factor (e.g., GAL-4). In 
the other construct, a DNA sequence, from a library of DNA sequences, that encodes an 
unidentified protein ("prey" or "sample") is fused to a gene that codes for the activation 

10 domain of the known transcription factor. If the "bait" and the "prey" proteins are able to 
interact, in vivo, forming a NOVX-dependent complex, the DNA-binding and activation 
domains of the transcription factor are brought into close proximity. This proximity allows 
transcription of a reporter gene (e.g., LacZ) that is operably linked to a transcriptional 
regulatory site responsive to the transcription factor. Expression of the reporter gene can be 

1 5 detected and cell colonies containing the functional transcription factor can be isolated and 
used to obtain the cloned gene that encodes the protein which interacts with NOVX. 

The invention further pertains to novel agents identified by the aforementioned 
screening assays and uses thereof for treatments as described herein. 

Detection Assays 

20 Portions or fragments of the cDNA sequences identified herein (and the 

corresponding complete gene sequences) can be used in numerous ways as polynucleotide 
reagents. By way of example, and not of limitation, these sequences can be used to: (/) map 
their respective genes on a chromosome; and, thus, locate gene regions associated with 
genetic disease; (//) identify an individual from a minute biological sample (tissue typing); 

25 and (/77) aid in forensic identification of a biological sample. Some of these applications are 
described in the subsections, below. 

Chromosome Mapping 

Once the sequence (or a portion of the sequence) of a gene has been isolated, this 
sequence can be used to map the location of the gene on a chromosome. This process is 
30 called chromosome mapping. Accordingly, portions or fragments of the NOVX sequences of 
SEQ ID NO:2a?-1, wherein n is an integer between I and 172, or fragments or derivatives 
thereof, can be used to map the location of the NOVX genes, respectively, on a chromosome. 
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The mapping of the NOVX sequences to chromosomes is an important first step in 
correlating these sequences with genes associated with disease. 

Briefly. NOVX genes can be mapped to chromosomes by preparing PCR primers 
(preferably 15-25 bp in length) from the NOVX sequences. Computer analysis of the 
5 NOVX, sequences can be used to rapidly select primers that do not span more than one exon 
in the genomic DNA, thus complicating the amplification process. These primers can then be 
used for PCR screening of somatic cell hybrids containing individual human chromosomes. 
Only those hybrids containing the human gene corresponding to the NOVX sequences will 
yield an amplified fragment. 

10 Somatic cell hybrids are prepared by fusing somatic cells from different mammals 

(e.g., human and mouse cells). As hybrids of human and mouse cells grow and divide, they 
gradually lose human chromosomes in random order, but retain the mouse chromosomes. By 
using media in which mouse cells cannot grow, because they lack a particular enzyme, but in 
which human cells can, the one human chromosome that contains the gene encoding the 

15 needed enzyme will be retained. By using various media, panels of hybrid cell lines can be 
established. Each cell line in a panel contains either a single human chromosome or a small 
number of human chromosomes, and a full set of mouse chromosomes, allowing easy 
mapping of individual genes to specific human chromosomes. See, e.g., D'Eustachio, et <?/., 
1983. Science 220: 919-924. Somatic cell hybrids containing only fragments of human 

20 chromosomes can also be produced by using human chromosomes with translocations and 
deletions. 

PCR mapping of somatic cell hybrids is a rapid procedure for assigning a particular 
sequence to a particular chromosome. Three or more sequences can be assigned per day 
using a single thermal cycler. Using the NOVX sequences to design oligonucleotide primers, 

25 sub-localization can be achieved with panels of fragments from specific chromosomes. 

Fluorescence in situ hybridization (FISH) of a DNA sequence to a metaphase 
chromosomal spread can further be used to provide a precise chromosomal location in one 
step. Chromosome spreads can be made using cells whose division has been blocked in 
metaphase by a chemical like colcemid that disrupts the mitotic spindle. The chromosomes 

30 can be treated briefly with trypsin, and then stained with Giemsa. A pattern of light and dark 
bands develops on each chromosome, so that the chromosomes can be identified individually. 
The FISH technique can be used with a DNA sequence as short as 500 or 600 bases. 
However, clones larger than 1,000 bases have a higher likelihood of binding to a unique 
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chromosomal location with sufficient signal intensity for simple detection. Preferably 1,000 
bases, and more preferably 2,000 bases, will suffice to get good results at a reasonable 
amount of time. For a review of this technique, see, Verma, et al., Human CHROMOSOMES: 
A Manual of Basic Techniques (Pergamon Press, New York 1988). 
5 Reagents for chromosome mapping can be used individually to mark a single 

chromosome or a single site on that chromosome, or panels of reagents can be used for 
marking multiple sites and/or multiple chromosomes. Reagents corresponding to noncoding 
regions of the genes actually are preferred for mapping purposes. Coding sequences are more 
likely to be conserved within gene families, thus increasing the chance of cross hybridizations 

1 0 during chromosomal mapping. 

Once a sequence has been mapped to a precise chromosomal location, the physical 
position of the sequence on the chromosome can be correlated with genetic map data. Such 
data are found, e.g., in McKusick, Mendeuan Inheritance in Man, available on-line 
through Johns Hopkins University Welch Medical Library). The relationship between genes 

15 and disease, mapped to the same chromosomal region, can then be identified through linkage 
analysis (co-inheritance of physically adjacent genes), described in, e.g., Egeland, et al., 
1987. Nature, 325: 783-787. 

Moreover, differences in the DNA sequences between individuals affected and 
unaffected with a disease associated with the NOVX gene, can be determined. If a mutation 

20 is observed in some or all of the affected individuals but not in any unaffected individuals, 
then the mutation is likely to be the causative agent of the particular disease. Comparison of 
affected and unaffected individuals generally involves first looking for structural alterations 
in the chromosomes, such as deletions or translocations that are visible from chromosome 
spreads or detectable using PCR based on that DNA sequence. Ultimately, complete 

25 sequencing of genes from several individuals can be performed to confirm the presence of a 
mutation and to distinguish mutations from polymorphisms. 

Tissue Typing 

The NOVX sequences of the invention can also be used to identify individuals from 
minute biological samples. In this technique, an individual's genomic DNA is digested with 
30 one or more restriction enzymes, and probed on a Southern blot to yield unique bands for 
identification. The sequences of the invention are useful as additional DNA markers for 
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RFLP ("restriction fragment length polymorphisms," described in U.S. Patent No. 
5,272,057). 

Furthermore, the sequences of the invention can be used to provide an alternative 
technique that determines the actual base-by-base DNA sequence of selected portions of an 
5 individual's genome. Thus, the NOVX sequences described herein can be used to prepare 
two PCR primers from the 5'- and 3'-termini of the sequences. These primers can then be 
used to amplify an individual's DNA and subsequently sequence it. 

Panels of corresponding DNA sequences from individuals, prepared in this manner, 
can provide unique individual identifications, as each individual will have a unique set of 

1 0 such DNA sequences due to allelic differences. The sequences of the invention can be used 
to obtain such identification sequences from individuals and from tissue. The NOVX 
sequences of the invention uniquely represent portions of the human genome. Allelic 
variation occurs to some degree in the coding regions of these sequences, and to a greater 
degree in the noncoding regions. It is estimated that allelic variation between individual 

15 humans occurs with a frequency of about once per each 500 bases. Much of the allelic 
variation is due to single nucleotide polymorphisms (SNPs), which include restriction 
fragment length polymorphisms (RFLPs). 

Each of the sequences described herein can, to some degree, be used as a standard 
against which DNA from an individual can be compared for identification purposes. Because 

20 greater numbers of polymorphisms occur in the noncoding regions, fewer sequences are 
necessary to differentiate individuals. The noncoding sequences can comfortably provide 
positive individual identification with a panel of perhaps 10 to 1,000 primers that each yield a 
noncoding amplified sequence of 100 bases. If coding sequences, such as those of SEQ ID 
NO:2/7-l, wherein n is an integer between I and 172, are used, a more appropriate number of 

25 primers for positive individual identification would be 500-2,000. 

Predictive Medicine 

The invention also pertains to the field of predictive medicine in which diagnostic 
assays, prognostic assays, pharmacogenomics, and monitoring clinical trials are used for 
prognostic (predictive) purposes to thereby treat an individual prophylactically. Accordingly, 
30 one aspect of the invention relates to diagnostic assays for determining NOVX protein and/or 
nucleic acid expression as well as NOVX activity, in the context of a biological sample (e.g., 
blood, serum, cells, tissue) to thereby determine whether an individual is afflicted with a 



WO 03/023002 




PCT/US02/28539 



disease or disorder, or is at risk of developing a disorder, associated with aberrant NOVX 
expression or activity. The disorders include metabolic disorders, diabetes, obesity, infectious 
disease, anorexia, cancer-associated cachexia, cancer, neurodegenerative disorders, 
Alzheimer's Disease, Parkinson's Disorder, immune disorders, and hematopoietic disorders, 
5 and the various dyslipidemias, metabolic disturbances associated with obesity, the metabolic 
syndrome X and wasting disorders associated with chronic diseases and various cancers. The 
invention also provides for prognostic (or predictive) assays for determining whether an 
individual is at risk of developing a disorder associated with NOVX protein, nucleic acid 
expression or activity. For example, mutations in a NOVX gene can be assayed in a 

10 biological sample. Such assays can be used for prognostic or predictive purpose to thereby 
prophylactically treat an individual prior to the onset of a disorder characterized by or 
associated with NOVX protein, nucleic acid expression, or biological activity. 

Another aspect of the invention provides methods for determining NOVX protein, 
nucleic acid expression or activity in an individual to thereby select appropriate therapeutic or 

15 prophylactic agents for that individual (referred to herein as "pharmacogenomics"). 
Pharmacogenomics allows for the selection of agents (e.g., drugs) for therapeutic or 
prophylactic treatment of an individual based on the genotype of the individual (e.g., the 
genotype of the individual examined to determine the ability of the individual to respond to a 
particular agent.) 

20 Yet another aspect of the invention pertains to monitoring the influence of agents 

(e.g., drugs, compounds) on the expression or activity of NOVX in clinical trials. 

These and other agents are described in further detail in the following sections. 

Diagnostic Assays 

An exemplary method for detecting the presence or absence of NOVX in a biological 
25 sample involves obtaining a biological sample from a test subject and contacting the 
biological sample with a compound or an agent capable of detecting NOVX protein or 
nucleic acid (e.g., mRNA, genomic DNA) that encodes NOVX protein such that the presence 
of NOVX is detected in the biological sample. An agent for detecting NOVX mRNA or 
genomic DNA is a labeled nucleic acid probe capable of hybridizing to NOVX mRNA or 
30 genomic DNA. The nucleic acid probe can be, for example, a full-length NOVX nucleic 

acid, such as the nucleic acid of SEQ IDNO:2w-l, wherein n is an integer between 1 and 172, 
or a portion thereof, such as an oligonucleotide of at least 15, 30, 50, 100, 250 or 500 
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nucleotides in length and sufficient to specifically hybridize under stringent conditions to 
NOVX mRNA or genomic DNA. Other suitable probes for use in the diagnostic assays of 
the invention are described herein. 

An agent for detecting NOVX protein is an antibody capable of binding to NOVX 
5 protein, preferably an antibody with a detectable label. Antibodies can be polyclonal, or 
more preferably, monoclonal. An intact antibody, or a fragment thereof (e.g., Fab or F(ab')2) 
can be used. The term "labeled", with regard to the probe or antibody, is intended to 
encompass direct labeling of the probe or antibody by coupling (i.e., physically linking) a 
detectable substance to the probe or antibody, as well as indirect labeling of the probe or 

10 antibody by reactivity with another reagent that is directly labeled. Examples of indirect 
labeling include detection of a primary antibody using a fluorescently-labeled secondary 
antibody and end-labeling of a DNA probe with biotin such that it can be detected with 
fluorescently-labeled streptavidin. The term "biological sample" is intended to include 
tissues, cells and biological fluids isolated from a subject, as well as tissues, cells and fluids 
1 15 present within a subject. That is, the detection method of the invention can be used to detect 
NOVX mRNA, protein, or genomic DNA in a biological sample in vitro as well as in vivo. 
For example, in vitro techniques for detection of NOVX mRNA include Northern 
hybridizations and in situ hybridizations. In vitro techniques for detection of NOVX protein 
include enzyme linked immunosorbent assays (ELISAs), Western blots, 

20 immunoprecipitations, and immunofluorescence. In vitro techniques for detection of NOVX 
genomic DNA include Southern hybridizations. Furthermore, in vivo techniques for 
detection of NOVX protein include introducing into a subject a labeled anti-NOVX antibody. 
For example, the antibody can be labeled with a radioactive marker whose presence and 
location in a subject can be detected by standard imaging techniques. 

25 In one embodiment, the biological sample contains protein molecules from the test 

subject. Alternatively, the biological sample can contain mRNA molecules from the test 
subject or genomic DNA molecules from the test subject. A preferred biological sample is a 
peripheral blood leukocyte sample isolated by conventional means from a subject. 

In another embodiment, the methods further involve obtaining a control biological 

30 sample from a control subject, contacting the control sample with a compound or agent 
capable of detecting NOVX protein, mRNA, or genomic DNA, such that the presence of 
NOVX protein, mRNA or genomic DNA is detected in the biological sample, and comparing 
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the presence of NOVX protein, mRNA or genomic DNA in the control sample with the 
presence of NOVX protein, mRNA or genomic DNA in the test sample. 

The invention also encompasses kits for detecting the presence of NOVX in a 
biological sample. For example, the kit can comprise: a labeled compound or agent capable 
5 of detecting NOVX protein or mRNA in a biological sample; means for determining the 
amount of NOVX in the sample; and means for comparing the amount of NOVX in the 
sample with a standard. The compound or agent can be packaged in a suitable container. 
The kit can further comprise instructions for using the kit to detect NOVX protein or nucleic 
acid. 

1 0 Prognostic Assays 

The diagnostic methods described herein can furthermore be utilized to identify 
subjects having or at risk of developing a disease or disorder associated with aberrant NOVX 
expression or activity. For example, the assays described herein, such as the preceding 
diagnostic assays or the following assays, can be utilized to identify a subject having or at 

1 5 risk of developing a disorder associated with NOVX protein, nucleic acid expression or 

activity. Alternatively, the prognostic assays can be utilized to identify a subject having or at 
risk for developing a disease or disorder. Thus, the invention provides a method for 
identifying a disease or disorder associated with aberrant NOVX expression or activity in 
which a test sample is obtained from a subject and NOVX protein or nucleic acid (e.g., 

20 mRNA, genomic DNA) is detected, wherein the presence of NOVX protein or nucleic acid is 
diagnostic for a subject having or at risk of developing a disease or disorder associated with 
aberrant NOVX expression or activity. As used herein, a "test sample" refers to a biological 
sample obtained from a subject of interest. For example, a test sample can be a biological 
fluid (e.g., serum), cell sample, or tissue. 

25 Furthermore, the prognostic assays described herein can be used to determine whether 

a subject can be administered an agent (e.g., an agonist, antagonist, peptidomimetic, protein, 
peptide, nucleic acid, small molecule, or other drug candidate) to treat a disease or disorder 
associated with aberrant NOVX expression or activity. For example, such methods can be 
used to determine whether a subject can be effectively treated with an agent for a disorder. 

30 Thus, the invention provides methods for determining whether a subject can be effectively 
treated with an agent for a disorder associated with aberrant NOVX expression or activity in 
. which a test sample is obtained and NOVX protein or nucleic acid is detected (e.g., wherein 
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the presence of NOVX protein or nucleic acid is diagnostic for a subject that can be 
administered the agent to treat a disorder associated with aberrant NOVX expression or 
activity). 

The methods of the invention can also be used to detect genetic lesions in a NOVX 
5 gene, thereby determining if a subject with the lesioned gene is at risk for a disorder 

characterized by aberrant cell proliferation and/or differentiation. In various embodiments, 
the methods include detecting, in a sample of cells from the subject, the presence or absence 
of a genetic lesion characterized by at least one of an alteration affecting the integrity of a 
gene encoding a NOVX-protein, or the misexpression of the NOVX gene. For example, such 

10 genetic lesions can be detected by ascertaining the existence of at least one of: (/) a deletion 
of one or more nucleotides from a NOVX gene; (/V) an addition of one or more nucleotides to 
a NOVX gene; (Hi) a substitution of one or more nucleotides of a NOVX gene, (iv) a 
chromosomal rearrangement of a NOVX gene; (v) an alteration in the level of a messenger 
RNA transcript of a NOVX gene, (v/) aberrant modification of a NOVX gene, such as of the 

15 methylation pattern of the genomic DNA, (v/7) the presence of a non-wild-type splicing 
pattern of a messenger RNA transcript of a NOVX gene, (v/77) a non- wild-type level of a 
NOVX protein, (ix) allelic loss of a NOVX gene, and (x) inappropriate post-translational 
modification of a NOVX protein. As described herein, there are a large number of assay 
techniques known in the art which can be used for detecting lesions in a NOVX gene. A 

20 preferred biological sample is a peripheral blood leukocyte sample isolated by conventional 
means from a subject. However, any biological sample containing nucleated cells may be 
used, including, for example, buccal mucosal cells. 

In certain embodiments, detection of the lesion involves the use of a probe/primer in a 
polymerase chain reaction (PCR) (see, e.g., U.S. Patent Nos. 4,683,195 and 4,683,202), such 

25 as anchor PCR or RACE PCR, or, alternatively, in a ligation chain reaction (LCR) (see, e.g., 
Landegran, et a!., 1988. Science 241 : 1077-1080; and Nakazawa, et al, 1994. Proc. Nail. 
Acad Sci. USA 91 : 360-364), the latter of which can be particularly useful for detecting point 
mutations in the NOVX-gene {see, Abravaya, et aL, 1995. NucL Acids Res. 23: 675-682). 
This method can include the steps of collecting a sample of cells from a patient, isolating 

30 nucleic acid (e.g., genomic, mRNA or both) from the cells of the sample, contacting the 
nucleic acid sample with one or more primers that specifically hybridize to a NOVX gene 
under conditions such that hybridization and amplification of the NOVX gene (if present) 
occurs, and detecting the presence or absence of an amplification product, or detecting the 
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size of the amplification product and comparing the length to a control sample. It is 
anticipated that PCR and/or LCR may be desirable to use as a preliminary amplification step 
in conjunction with any of the techniques used for detecting mutations described herein. 

Alternative amplification methods include: self sustained sequence replication (see, 
5 Guatelli, et at., 1 990. Proc. Natl Acad. Sci USA 87: 1 874- 1 878), transcriptional 

amplification system (see, Kwoh, et at., 1989. Proa Natl. Acad Sci. USA 86: 1 1 73-1 1 77); 
Q(3 Replicase (see, Lizardi, et al, 1988. BioTechnology 6: 1 197), or any other nucleic acid 
amplification method, followed by the detection of the amplified molecules using techniques 
well known to those of skill in the art. These detection schemes are especially useful for the 

10 detection of nucleic acid molecules if such molecules are present in very low numbers. 

In an alternative embodiment, mutations in a NOVX gene from a sample cell can be 
identified by alterations in restriction enzyme cleavage patterns. For example, sample and 
control DNA is isolated, amplified (optionally), digested with one or more restriction 
endonucleases, and fragment length sizes are determined by gel electrophoresis and 

15 compared. Differences in fragment length sizes between sample and control DNA indicates 
mutations in the sample DNA. Moreover, the use of sequence specific ribozymes (see, e.g., 
U.S. Patent No. 5,493,531) can be used to score for the presence of specific mutations by 
development or loss of a ribozyme cleavage site. 

In other embodiments, genetic mutations in NOVX can be identified by hybridizing a 

20 sample and control nucleic acids, e.g., DNA or RNA, to high-density arrays containing 
hundreds or thousands of oligonucleotides probes. See, e.g., Cronin, et o/. f 1996. Human 
Mutation 7: 244-255; Kozal, et al, 1996. Nat. Med. 2: 753-759. For example, genetic 
mutations in NOVX can be identified in two dimensional arrays containing light-generated 
DNA probes as described in Cronin, et aL, supra. Briefly, a first hybridization array of 

25 probes can be used to scan through long stretches of DNA in a sample and control to identify 
base changes between the sequences by making linear arrays of sequential overlapping 
probes. This step allows the identification of point mutations. This is followed by a second 
hybridization array that allows the characterization of specific mutations by using smaller, 
specialized probe arrays complementary to all variants or mutations detected. Each mutation 

30 array is composed of parallel probe sets, one complementary to the wild-type gene and the 
other complementary to the mutant gene. 

In yet another embodiment, any of a variety of sequencing reactions known in the art 
can be used to directly sequence the NOVX gene and detect mutations by comparing the 

90 



WO 03/023002 




PCT/US02/28539 



sequence of the sample NOVX with the corresponding wild-type (control) sequence. 
Examples of sequencing reactions include those based on techniques developed by Maxim ' 
and Gilbert, 1977. Proc. Natl. Acad. Sci. USA 74: 560 or Sanger, 1977. Proc. Natl. Acad Sci. 
USA 74: 5463. It is also contemplated that any of a variety of automated sequencing 
5 procedures can be utilized when performing the diagnostic assays (see, e.g., Naeve, et al., 
1995. Biotechniques 19: 448), including sequencing by mass spectrometry (see, e.g., PCT 
International Publication No. WO 94/16101 ; Cohen, et al, 1996. Adv. Chromatography 36: 
127-162; and Griffin, et a/., \993. Appl Biochem. Biotechnol. 38: 147-159). 

Other methods for detecting mutations in the NOVX gene include methods in which 

10 protection from cleavage agents is used to detect mismatched bases in RNA/RNA or 

RNA/DNA heteroduplexes. See f e.g., Myers, et al., 1985. Science 230: 1242. In general, the 
art technique of "mismatch cleavage" starts by providing heteroduplexes of formed by 
hybridizing (labeled) RNA or DNA containing the wild-type NOVX sequence with 
potentially mutant RNA or DNA obtained from a tissue sample. The double-stranded 

1 5 duplexes are treated with an agent that cleaves single-stranded regions of the duplex such as 
which will exist due to basepair mismatches between the control and sample strands. For 
instance, RNA/DNA duplexes can be treated with RNase and DNA/DNA hybrids treated 
with Si nuclease to enzymatically digesting the mismatched regions. In other embodiments, 
either DNA/DNA or RNA/DNA duplexes can be treated with hydroxylamine or osmium 

20 tetroxide and with piperidine in order to digest mismatched regions. After digestion of the 
mismatched regions, the resulting material is then separated by size on denaturing 
polyacrylamide gels to determine the site of mutation. See, e.g., Cotton, et al, 1988. Proc. 
Natl. Acad Sci. USA 85: 4397; Saleeba, e(al. t 1992. Methods Enzymol 217: 286-295. In an 
embodiment, the control DNA or RNA can be labeled for detection. 

25 In still another embodiment, the mismatch cleavage reaction employs one or more 

proteins that recognize mismatched base pairs in double-stranded DNA (so called "DNA 
mismatch repair" enzymes) in defined systems for detecting and mapping point mutations in 
NOVX cDNAs obtained from samples of cells. For example, the mutY enzyme of E. coli 
cleaves A at G/A mismatches and the thymidine DNA glycosylase from HeLa cells cleaves T 

30 at G/T mismatches. See, e.g., Hsu, et al., 1994. Carcinogenesis 1 5: 1657-1662. According to 
an exemplary embodiment, a probe based on a NOVX sequence, e.g., a wild-type NOVX 
sequence, is hybridized to a cDNA or other DNA product from a test cell(s). The duplex is 
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treated with a DNA mismatch repair enzyme, and the cleavage products, if any, can be 
detected from electrophoresis protocols or the like. See, e.g., U.S. Patent No. 5,459,039. 

In other embodiments, alterations in electrophoretic mobility will be used to identify 
mutations in NOVX genes. For example, single strand conformation polymorphism (SSCP) 
5 may be used to detect differences in electrophoretic mobility between mutant and wild type 
nucleic acids. See, e.g., Orita, etal., 1989. Proc. Natl. Acad. Sci. USA: 86: 2766; Cotton, 
1993. Mutat. Res. 285: 125-144; Hayashi, 1992. Genet. Anal. Tech. AppL 9: 73-79. 
Single-stranded DNA fragments of sample and control NOVX nucleic acids will be 
denatured and allowed to renature. The secondary structure of single-stranded nucleic acids 

1 0 varies according to sequence, the resulting alteration in electrophoretic mobility enables the 
detection of even a single base change. The DNA fragments may be labeled or detected with 
labeled probes. The sensitivity of the assay may be enhanced by using RNA (rather than 
DNA), in which the secondary structure is more sensitive to a change in sequence. In one 
embodiment, the subject method utilizes heteroduplex analysis to separate double straffded 

1 5 heteroduplex molecules on the basis of changes in electrophoretic mobility. See, e.g., Keen, 
et al. % 1 99 1 . Trends Genet. 7: 5. 

In yet another embodiment, the movement of mutant or wild-type fragments in 
polyacrylamide gels containing a gradient of denaturant is assayed using denaturing gradient 
gel electrophoresis (DGGE). See, e.g., Myers, et al. f 1985. Nature 313: 495. When DGGE is 

20 used as the method of analysis, DNA will be modified to insure that it does not completely 
denature, for example by adding a GC clamp of approximately 40 bp of high-melting 
GC-rich DNA by PCR. In a further embodiment, a temperature gradient is used in place of a 
denaturing gradient to identify differences in the mobility of control and sample DNA. See, 
e.g., Rosenbaum and Reissner, 1987. Biophys. Chem. 265: 12753. 

25 Examples of other techniques for detecting point mutations include, but are not 

limited to, selective oligonucleotide hybridization, selective amplification, or selective primer 
extension. For example, oligonucleotide primers may be prepared in which the known 
mutation is placed centrally and then hybridized to target DNA under conditions that permit 
hybridization only if a perfect match is found. See, e.g., Saiki, et al., 1986. Nature 324: 163; 

30 Saiki, et al., 1989. Proc. Natl. Acad. Sci. USA 86: 6230. Such allele specific oligonucleotides 
are hybridized to PCR amplified target DNA or a number of different mutations when the 
oligonucleotides are attached to the hybridizing membrane and hybridized with labeled target 
DNA. 
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Alternatively, allele specific amplification technology that depends on selective PCR 
amplification may be used in conjunction with the instant invention. Oligonucleotides used 
as primers for specific amplification may carry the mutation of interest in the center of the 
molecule (so that amplification depends on differential hybridization; see, e.g., Gibbs, et al., 
5 ] 989. Nucl. Acids Res. 1 7: 2437-2448) or at the extreme 3'-terminus of one primer where, 
under appropriate conditions, mismatch can prevent, or reduce polymerase extension (see, 
e.g., Prossner, 1993. Tibtech. 1 1: 238). In addition it may be desirable to introduce a novel 
restriction site in the region of the mutation to create cleavage-based detection. See, e.g., 
Gasparini, et al, 1992. Mol Cell Probes 6: 1 . It is anticipated that in certain embodiments 

1 0 amplification may also be performed using Taq ligase for amplification. See, e.g., Barany, 
1991. Proc. Natl Acad. Sci. USA 88: 189. In such cases, ligation will occur only if there is a 
perfect match at the 3'-terminus of the 5' sequence, making it possible to detect the presence 
of a known mutation at a specific site by looking for the presence or absence of amplification. 
The methods described herein may be performed, for example, by utilizing 

1 5 pre-packaged diagnostic kits comprising at least one probe nucleic acid or antibody reagent 
described herein, which may be conveniently used, e.g., in clinical settings to diagnose 
patients exhibiting symptoms or family history of a disease or illness involving a NOVX 
gene. 

Furthermore, any cell type or tissue, preferably peripheral blood leukocytes, in which 
20 NOVX is expressed may be utilized in the prognostic assays described herein. However, any 
biological sample containing nucleated cells may be used, including, for example, buccal 
mucosal cells. 

Pharmacogenomics 

Agents, or modulators that have a stimulatory or inhibitory effect on NOVX activity 
25 (e.g., NOVX gene expression), as identified by a screening assay described herein can be 
administered to individuals to treat (prophylactically or therapeutically) disorders. The 
disorders include but are not limited to, e.g., those diseases, disorders and conditions listed 
above, and more particularly include those diseases, disorders, or conditions associated with 
homologs of a NOVX protein, such as those summarized in Table A. 
30 In conjunction with such treatment, the pharmacogenomics (i.e., the study of the 

relationship between an individual's genotype and that individual's response to a foreign 
compound or drug) of the individual may be considered. Differences in metabolism of 
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therapeutics can lead to severe toxicity or therapeutic failure by altering the relation between 
dose and blood concentration of the pharmacologically active drug. Thus, the 
pharmacogenomics of the individual permits the selection of effective agents (e.g., drugs) for 
prophylactic or therapeutic treatments based on a consideration of the individual's genotype. 
5 Such pharmacogenomics can further be used to determine appropriate dosages and 

therapeutic regimens. Accordingly, the activity of NOVX protein, expression of NOVX 
nucleic acid, or mutation content of NOVX genes in an individual can be determined to 
thereby select appropriate agent(s) for therapeutic or prophylactic treatment of the individual. 
Pharmacogenomics deals with clinically significant hereditary variations in the 

1 0 response to drugs due to altered drug disposition and abnormal action in affected persons. 

See e.g., Eichelbaum, 1996. Clin. Exp. Pharmacol Physiol., 23: 983-985; Linder, 1997. Clin. 
Chem., 43: 254-266. In general, two types of pharmacogenetic conditions can be 
differentiated. Genetic conditions transmitted as a single factor altering the way drugs act on 
the body (altered drug action) or genetic conditions transmitted as single factors altering the 

1 5 way the body acts on drugs (altered drug metabolism). These pharmacogenetic conditions 
can occur either as rare defects or as polymorphisms. For example, glucose-6-phosphate 
dehydrogenase (G6PD) deficiency is a common inherited enzymopathy in which the main 
clinical complication is hemolysis after ingestion of oxidant drugs (anti-malarials, 
sulfonamides, analgesics, nitrofurans) and consumption of fava beans. 

20 As an illustrative embodiment, the activity of drug metabolizing enzymes is a major 

determinant of both the intensity and duration of drug action. The discovery of genetic 
polymorphisms of drug metabolizing enzymes (e.g., N-acety I transferase 2 (NAT 2) and 
cytochrome pregnancy zone protein precursor enzymes CYP2D6 and CYP2C19) has 
provided an explanation as to why some patients do not obtain the expected drug effects or 

25 show exaggerated drug response and serious toxicity after taking the standard and safe dose 
of a drug. These polymorphisms are expressed in two phenotypes in the population, the 
extensive metabolizer (EM) and poor metabolizer (PM). The prevalence of PM is different 
among different populations. For example, the gene coding for CYP2D6 is highly 
polymorphic and several mutations have been identified in PM, which all lead to the absence 

30 of functional CYP2D6. Poor metabolizers of CYP2D6 and CYP2C 19 quite frequently 

experience exaggerated drug response and side effects when they receive standard doses. If a 
metabolite is the active therapeutic moiety, PM show no therapeutic response, as 
demonstrated for the analgesic effect of codeine mediated by its CYP2D6-formed metabolite 
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morphine. At the other extreme are the so called ultra-rapid metabolizers who do not respond 
to standard doses. Recently, the molecular basis of ultra-rapid metabolism has been 
identified to be due to CYP2D6 gene amplification. 

Thus, the activity of NOVX protein, expression of NOVX nucleic acid, or mutation 
5 content of NOVX genes in an individual can be determined to thereby select appropriate 
agent(s) for therapeutic or prophylactic treatment of the individual. In addition, 
pharmacogenetic studies can be used to apply genotyping of polymorphic alleles encoding 
drug-metabolizing enzymes to the identification of an individual's drug responsiveness 
phenotype. This knowledge, when applied to dosing or drug selection, can avoid adverse 
1 0 reactions or therapeutic failure and thus enhance therapeutic or prophylactic efficiency when 
treating a subject with a NOVX modulator, such as a modulator identified by one of tfie 
exemplary screening assays described herein. 

Monitoring of Effects During Clinical Trials 

Monitoring the influence of agents (e.g., drugs, compounds) on the expression or 

15 activity of NOVX (e.g., the ability to modulate aberrant cell proliferation and/or 

differentiation) can be applied not only in basic drug screening, but also in clinical trials. For 
example, the effectiveness of an agent determined by a screening assay as described herein to 
increase NOVX gene expression, protein levels, or upregulate NOVX activity, can be 
monitored in clinical trails of subjects exhibiting decreased NOVX gene expression, protein 

20 levels, or downregulated NOVX activity. Alternatively, the effectiveness of an agent 
determined by a screening assay to decrease NOVX gene expression, protein levels, or 
downregulate NOVX activity, can be monitored in clinical trails of subjects exhibiting 
increased NOVX gene expression, protein levels, or upregulated NOVX activity. In such 
clinical trials, the expression or activity of NOVX and, preferably, other genes that have been 

25 implicated in, for example, a cellular proliferation or immune disorder can be used as a "read 
out" or markers of the immune responsiveness of a particular cell. 

By way of example, and not of limitation, genes, including NOVX, that are 
modulated in cells by treatment with an agent (e.g., compound, drug or small molecule) that 
modulates NOVX activity (e.g., identified in a screening assay as described herein) can be 

30 identified. Thus, to study the effect of agents on cellular proliferation disorders, for example, 
in a clinical trial, cells can be isolated and RNA prepared and analyzed for the levels of 
expression of NOVX and other genes implicated in the disorder. The levels of gene 
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expression (i.e., a gene expression pattern) can be quantified by Northern blot analysis or 
RT-PCR, as described herein, or alternatively by measuring the amount of protein produced, 
by one of the methods as described herein, or by measuring the levels of activity of NOVX or 
other genes. In this manner, the gene expression pattern can serve as a marker, indicative of 
5 the physiological response of the cells to the agent. Accordingly, this response state may be 
determined before, and at various points during, treatment of the individual with the agent. 

In one embodiment, the invention provides a method for monitoring the effectiveness 
of treatment of a subject with an agent (e.g., an agonist, antagonist, protein, peptide, 
peptidomimetic, nucleic acid, small molecule, or other drug candidate identified by the 

10 screening assays described herein) comprising the steps of (/) obtaining a pre-administration 
sample from a subject prior to administration of the agent; (it) detecting the level of 
expression of a NOVX protein, mRNA, or genomic DNA in the preadministration sample; 
(Hi) obtaining one or more post-administration samples from the subject; (/V) detecting the 
level of expression or activity of the NOVX protein, mRNA, or genomic DNA in the 

1 5 post-administration samples; (v) comparing the level of expression or activity of the NOVX 
protein, mRNA, or genomic DNA in the pre-administration sample with the NOVX protein, 
mRNA, or genomic DNA in the post administration sample or samples; and (vi) altering the 
administration of the agent to the subject accordingly. For example, increased administration 
of the agent may be desirable to increase the expression or activity of NOVX to higher levels 

20 than detected, i.e., to increase the effectiveness of the agent. Alternatively, decreased 

administration of the agent may be desirable to decrease expression or activity of NOVX to 
lower levels than detected, i.e., to decrease the effectiveness of the agent. 

Methods of Treatment 

The invention provides for both prophylactic and therapeutic methods of treating a 
25 subject at risk of (or susceptible to) a disorder or having a disorder associated with aberrant 
NOVX expression or activity. The disorders include but are not limited to, e.g., those 
diseases, disorders and conditions listed above, and more particularly include those diseases, 
disorders, or conditions associated with homologs of a NOVX protein, such as those 
summarized in Table A. 
30 These methods of treatment will be discussed more fully, below. 
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Diseases and Disorders 

Diseases and disorders that are characterized by increased (relative to a subject not 
suffering from the disease or disorder) levels or biological activity may be treated with 
Therapeutics that antagonize (i.e., reduce or inhibit) activity. Therapeutics that antagonize 
5 activity may be administered in a therapeutic or prophylactic manner. Therapeutics that may 
be utilized include, but are not limited to: (/) an aforementioned peptide, or analogs, 
derivatives, fragments or homologs thereof; (//) antibodies to an aforementioned peptide; (///) 
nucleic acids encoding an aforementioned peptide; (/V) administration of antisense nucleic 
acid and nucleic acids that are "dysfunctional" (i.e., due to a heterologous insertion within the 

10 coding sequences of coding sequences to an aforementioned peptide) that are utilized to 

"knockout" endogenous function of an aforementioned peptide by homologous recombination 
(see, e.g., Capecchi, 1989. Science 244: 1288-1292); or (v) modulators ( i.e., inhibitors, 
agonists and antagonists, including additional peptide mimetic of the invention or antibodies 
specific to a peptide of the invention) that alter the interaction between an aforementioned 

15 peptide and its binding partner. 

Diseases and disorders that are characterized by decreased (relative to a subject not 
suffering from the disease or disorder) levels or biological activity may be treated with 
Therapeutics that increase (i.e., are agonists to) activity. Therapeutics that upregulate activity 
may be administered in a therapeutic or prophylactic manner. Therapeutics that may be 

20 utilized include, but are not limited to, an aforementioned peptide, or analogs, derivatives, 
fragments or homologs thereof; or an agonist that increases bioavailability. 

Increased or decreased levels can be readily detected by quantifying peptide and/or 
RNA, by obtaining a patient tissue sample (e.g., from biopsy tissue) and assaying it in vitro 
for RNA or peptide levels, structure and/or activity of the expressed peptides (or mRNAs of 

25 an aforementioned peptide). Methods that are well-known within the art include, but are not 
limited to, immunoassays (e.g., by Western blot analysis, immunoprecipitation followed by 
sodium dodecyl sulfate (SDS) polyacrylamide gel electrophoresis, immunocytochemistry, 
etc.) and/or hybridization assays to detect expression of mRNAs (e.g., Northern assays, dot 
blots, in situ hybridization, and the like). 

30 Prophylactic Methods 

In one aspect, the invention provides a method for preventing, in a subject, a disease 
or condition associated with an aberrant NOVX expression or activity, by administering to 

97 



WO 03/023002 




PCT/US02/28539 



the subject an agent that modulates NOVX expression or at least one NOVX activity. 
Subjects at risk for a disease that is caused or contributed to by aberrant NOVX expression or 
activity can be identified by, for example, any or a combination of diagnostic or prognostic 
assays as described herein. Administration of a prophylactic agent can occur prior to the 
5 manifestation of symptoms characteristic of the NOVX aberrancy, such that a disease or 
disorder is prevented or, alternatively, delayed in its progression. Depending upon the type 
of NOVX aberrancy, for example, a NOVX agonist or NOVX antagonist agent can be used 
for treating the subject. The appropriate agent can be determined based on screening assays 
described herein. The prophylactic methods of the invention are further discussed in the 
10 fol lowing subsections. 

Therapeutic Methods 

Another aspect of the invention pertains to methods of modulating NOVX expression 
or activity for therapeutic purposes. The modulatory method of the invention involves 
contacting a cell with an agent that modulates one or more of the activities of NOVX protein 

1 5 activity associated with the cell. An agent that modulates NOVX protein activity can be an 
agent as described herein, such as a nucleic acid or a protein, a naturally-occurring cognate 
ligand of a NOVX protein, a peptide, a NOVX peptidomimetic, or other small molecule. In 
one embodiment, the agent stimulates one or more NOVX protein activity. Examples of such 
stimulatory agents include active NOVX protein and a nucleic acid molecule encoding 

20 NOVX that has been introduced into the cell. In another embodiment, the agent inhibits one 
or more NOVX protein activity. Examples of such inhibitory agents include antisense 
NOVX nucleic acid molecules and anti-NOVX antibodies. These modulatory methods can 
be performed in vitro (e.g., by culturing the cell with the agent) or, alternatively, in vivo (e.g., 
by administering the agent to a subject). As such, the invention provides methods of treating 

25 an individual afflicted with a disease or disorder characterized by aberrant expression or 
activity of a NOVX protein or nucleic acid molecule. In one embodiment, the method 
involves administering an agent (e.g., an agent identified by a screening assay described 
herein), or combination of agents that modulates (e.g., up-regulates or down-regulates) 
NOVX expression or activity. In another embodiment, the method involves administering a 

30 NOVX protein or nucleic acid molecule as therapy to compensate for reduced or aberrant 
NOVX expression or activity. 
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Stimulation of NOVX activity is desirable in situations in which NOVX is abnormally 
downregulated and/or in which increased NOVX activity is likely to have a beneficial effect. 
One example of such a situation is where a subject has a disorder characterized by aberrant 
cell proliferation and/or differentiation (e.g., cancer or immune associated disorders). 
5 Another example of such a situation is where the subject has a gestational disease (e.g., 
preclampsia). 

Determination of the Biological Effect of the Therapeutic 

In various embodiments of the invention, suitable in vitro or in vivo assays are 
performed to determine the effect of a specific Therapeutic and whether its administration is 

1 0 indicated for treatment of the affected tissue. 

In various specific embodiments, in vitro assays may be performed with 
representative cells of the type(s) involved in the patient's disorder, to determine if a given 
Therapeutic exerts the desired effect upon the cell type(s). Compounds for use in therapy 
may be tested in suitable animal model systems including, but not limited to rats, mice, 

15 chicken, cows, monkeys, rabbits, and the like, prior to testing in human subjects. Similarly, 
for in vivo testing, any of the animal model system known in the art may be used prior to 
administration to human subjects. 

Prophylactic and Therapeutic Uses of the Compositions of the Invention 

The NOVX nucleic acids and proteins of the invention are useful in potential 
20 prophylactic and therapeutic applications implicated in a variety of disorders. The disorders 
include but are not limited to, e.g., those diseases, disorders and conditions listed above, and 
more particularly include those diseases, disorders, or conditions associated with homologs of 
a NOVX protein, such as those summarized in Table A. 

As an example, a cDNA encoding the NOVX protein of the invention may be useful 
25 in gene therapy, and the protein may be useful when administered to a subject in need 
thereof. By way of non-limiting example, the compositions of the invention will have 
efficacy for treatment of patients suffering from diseases, disorders, conditions and the like, 
including but not limited to those listed herein. 

Both the novel nucleic acid encoding the NOVX protein, and the NOVX protein of 
30 the invention, or fragments thereof, may also be useful in diagnostic applications, wherein the 
presence or amount of the nucleic acid or the protein are to be assessed. A further use could 
be as an anti-bacterial molecule (i.e., some peptides have been found to possess anti-bacterial 
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properties). These materials are further useful in the generation of antibodies, which 
immunospecifically-bind to the novel substances of the invention for use in therapeutic or 
diagnostic methods. 

The invention will be further described in the following examples, which do not limit 
5 the scope of the invention described in the claims. 

EXAMPLES 

Example A: Polynucleotide and Polypeptide Sequences, and Homology Data 

. The NOV1 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 1 A. 

10 



Table 1A. NOV1 Sequence Analysis 




SEQIDNO:l 459 bp j 


NOV la, 

CG 101036-01 DNA 
Sequence 


GGTCACTTCTTGGTCCTGGTCCAGGGCCTGTACTGACCACCTCCACGTGCCACTGGGGCTGTA 


AGGAGGAATGGCGGCCGTGGGCAGCCTGCTTGGCCGGCTGAGGCAGAGCACCGTGAAGGCCAC 
CGGACCTGCACTCCGCCGCCTGCACACATCCTCCTGGCGAGCTGACAGCAGCAGGGCCTCACT 
CACTCGTGTGCACCGCCAGGCTTATGCACGACTCTACCCCGTGCTGCTGGTGAAGCAGGATGG 
CTCCACCATCCACATCCGCTACAGGGAGCCACGGCGCATGCTGGCGATGCCCATAGATCTGGA 
CACCCTGTCTCCTGAGGAGCGCCGGGCCAGGCTGCGGAAGCGTGGGGCTCAGCTCCAGTCGAG 
GAAGGAGTACGAGCAGGAGCTCAGTGATGACTTGCATGTGGAGCGCTACCGACAGGTCTGGAC 
CAGGACCAAGAAGTGACC 




ORF Start: ATG at 7 1 | jORF Stop: TGA at 455 




SEQIDNO:2 |l28aa 


MWat 15008.1kD 


NOV la, 

CGI 01 036-01 Protein 
Sequence 


MAAVGSLLGRLRQSTVKATGPALRRLHTSSWRADSSRASLTRVHRQAYARLYPVLLVKQDGST 
IHIRYREPRRMLAMPIDLDTLSPEERRARLRKRGAQLQSRKEYEQELSDDLHVERYRQVWTRT 
KK 



Further analysis of the NOV I a protein yielded the following properties shown in 
Table IB. 



! Table IB. Protein Sequence Properties NOVla 



PSort 
analysis: 

! 


0.5756 probability located in nucleus; 0.5070 probability located in mitochondrial 
matrix space; 0.3000 probability located in microbody (peroxisome); 0.2297 
probability located in mitochondrial inner membrane 


1 SignalP 
! analysis: 


No Known Signal Sequence Predicted 



15 

A search of the NOVla protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 1C. 
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; Table 1C Geneseq Results for NOVla 



5 



I 

Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOVla 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


ABB97259 


Novel human protein SEQ ID NO: 527 - 
Homo sapiens, 128 aa. (WO200222660- 
A2,2l-MAR-2002] 


I ..128 


126/128 (98%) 
126/128 (98%) 


le-66 


AAB63419 


Human breast cancer associated antigen 
protein sequence SEQ ID NO:781 - 
Homo sapiens, 130aa. [WO200073801- 
A2, 07-DEC-2000] 


I. .128 
3..130 


126/128 (98%) 
126/128 (98%) 


le-66 


AAU29307 


.Human PRO polypeptide sequence #284 
- Homo sapiens, 1 64 aa. 
[WO200I68848-A2, 20-SEP-200I] 


5.. 128 
4I..164 


119/124 (95%)' 
119/124 (95%) 


3e ; 62 

i 


ABB69193 


Drosophila melanogaster polypeptide 
SEQ ID NO 34371 - Drosophila 
melanogaster, 107 aa. [WO200171042- 
A2, 27-SEP-2001] 


35. .119 
18..100 


37/85 (43%) 
61/85(71%) 


le-15 


AAM96532 


Human reproductive system related 
antigen SEQ ID NO: 5190- Homo 
sapiens, 68 aa. [WO200155320-A2, 02- 
AUG-2001] 


5..46 
16..57 


37/42 (88%) 
37/42 (88%) 


4e-13 | 


In a BLAST search of public sequence datbases, the NOVla protein was found to 
have homology to the proteins shown in the BLASTP data in Table 1 D. 


Table ID. Public BLASTP Results for NOVla 


Protein 

Accession 

Number 


Protcin/Organism/Lcngth 


NOVla 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9CZ83 


2810038N09Rik protein - Mus 
musculus (Mouse). 127 aa. 


7..128 
6..127 


89/122 (72%) 
97/122 (78%) 


2e-44 


Q8RO30 


Similar to mitochondrial ribosomal 
protein L55 - Mus musculus (Mouse), 
134 aa. 


1..128 
7..I34 


89/128(69%) 
100/128(77%) 


4e-44 


Q9VE04 


CGI 4283 protein (Putative 
transcription factor) (RH10246p) - 
Drosophila melanogaster (Fruit fly), 
107 aa. 


35.. 1 19 
18.. 100 


37/85 (43%) 
61/85(71%) 


3e-l5 


Q9TYJ8 


Y66H 1 A.3 protein - Caenorhabditis 
elegans, 150 aa. 


32..124 
51..141 


31/93 (33%) 
48/93 (51%) 


5e-07 
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Q9UJF2 


Ras GTPase-activating protein nGAP 


69..128 


20/61 (32%) 


0.19 j 




(RAS protein activator like 1) - Homo 


938..998 


32/61 (51%) 






sapiens (Human), 1 139 aa. 






! 



PFam analysis predicts that the NOV I a protein contains the domains shown in the 
Table IE. 



Table IE. Domain Analysis of NOVla 



Pfam Domain 



NOVla Match Region 



Identities/ 
Similarities 

for the Matched Region 



Expect Value 



Example 2. 

The NOV2 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 2A. 



Tabic 2A. NOV2 Sequence Analysis 


JsEqIdInO: 3 j 1 067 bp j 


NOV2a, 

CGI 01055-01 DNA 
Sequence 


CGGCTGCGCTTCTGCTGGAAACGCTTGCTGGCGCCTGTCACCGGTTCCCTCCATTTTGAAAGG 


GAAAAAGGCTCTCCCCACCCATTCCCCTGCCCCTAGGAGCTGGAGCCGGAGGAGCCGCGCTCA 


TGGCGTTCAGCCCGTGGCAGATCCTGTCCCCCGTGCAGTGGGCGAAATGGACGTGGTCTGCGG 
TACGCGGCGGGGCCGCCGGCGAGGACGAGGCTGGCGGGCCCGAGGGCGACCCCGAGGAGGAGG 
ATTCGCAAGCCGAGACCAAATCCTTGAGTTTCAGCTCGGATTCTGAAGGTAATTTTGAGACTC 
CTGAAGCTGAAACCCCGATCCGATCACCTTTCAAGGAGTCCTGTGATCCATCACTCGGATTGG 
CAGGACCTGGGGCCAAAAGCCAAGAATCACAAGAAGCTGATGAACAGCTTGTAGCAGAAGTGG 
TTG AAAAATGTTC AT CT AAG ACTTGTTCT AAACC TT C AG AAAATG AAGTGCC A C AG C AG GC C A 
TTGACTCTCACTCAGTCAAGAATTTCAGAGAAGAACCTGAACATGATTTTAGCAAAATTTCCA 
TCGTGAGGCCATTTTCAATAGAAACGAAGGATTCCACGGATATCTCGGCAGTCCTCGGAACAA 
AAGTAGCTCATGGCTGTGTAACTGCAGTCTCAGGCAAGGCTCTGCCTTCCAGCCCGCCAGACG 
CCCTCCAGGACGAGGCGATGACAGAAGGCAGCATGGGGGTCACCCTCGAGGCCTCCGCAGAAG 
CTG ATCT AAAAG C TG GC AACTCCTGT C C AG AG CT TGTG C CC AG C AG AAG AAG C AAGCTG AG AA 
AGCCCAAGCCTGTCCCCCTGAGGAAGAAAGCAATTGGAGGAGAGTTCTCAGACACCAACGCTG 
CTGTGGAGGGCACACCTCTCCCCAAGGCATCCTATCACTTCAGTCCTGAAGAGTTGGATGAGA 
ACACAAGTCCTTTGCTAGGAGATGCCAGGTTCCAGAAGTCTCCCCCTGACCTTAAAGAAACTC 
CCGGCACTCTCAGTAGTGACACCAACGACTCAGGGGTGGAGCTGGGGGCACGGTGAATG 




ORF Start: ATG at 1 26 jORF Stop: TG A at 1 062 




SEQlDNO:4 3l2aa MW at 33130.0kD 


NOV2a, 

CGI 01055-01 Protein 
Sequence 


MAFS PWQ I LS P VQWAKWT WS AVRGGAAGEDEAGGPEGDPEEEDSQAETKS LS FS SDSEGN FET 
PEAETPIRSPFKESCDPSLGLAGPGAKSQESQEADEQLVAEWEKCSSKTCSKPSENEVPQQA 
IDSHSVKNFREEPEHDFSKISIVRPFSIETKDSTDISAVLGTKVAHGCVTAVSGKALPSSPPD 
ALQDEAMTEGSMGVTLEASAEADLKAGNSCPELVPSRRSKLRKPKPVPLRKKAIGGEFSDTNA 
AVEGTPLPKASYHFSPEELDENTSPLLGDARFQKSPPDLKETPGTLSSDTNDSGVELGAR 




SEQiDNO:5 |273 1 bp j j 


NOV2b, 

CG 101055-02 DNA 
Sequence 


CAGAGGTCTAGCAGCCGGGCGCCGCGGGCCGGGGGCCTGAGGAGGCCACAGGACGGGCGTCTT 


CCCGGCTAGTGGAGCCCGGCGCGGGGCCCGCTGCGGCCGCACCGTGAGGGGAGGAGGCCGAGG 


AGGACGCGGCGCCGGCTGCCGGCGGGAGGAAGCGCTCCACCAGGGCCCCCGACGGCACTCGTT 


TAACCACATCCGCGCCTCTGCTGGAAACGCTTGCTGGCGCCTGTCACCGGTTCCCTCCATTTT 


GAAAGGGAAAAAGGCTCTCCCCACCCATTCCCCTGCCCCTAGGAGCTGGAGCCGGAGGAGCCG 
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CGCTCATGGCGTTCAGCCCGTGGCAGATCCTGTCCCCCGTGCAGTGGGCGAAATGGACGTGGT 
CTGCGGTACGCGGCGGGGCCGCCGGCGAGGACGAGGCTGGCGGGCCCGAGGGCGACCCCGAGG 
AGGAGGATTCGCAAGCCGAGACCAAATCCTTGAGTTTCAGCTCGGATTCTGAAGGTAATTTTG 
AGACTCCTGAAGCTGAAACCCCGATCCGATCACCTTTCAAGGAGTCCTGTGATCCATCACTCG 
G ATTGGC AGG ACCTGGGGCC AAAAG C C AAG AATC AC AAG AAG CTG ATG AAC AG CTTG T AGC AG 
AAGTGGTTGAAAAATGTTCATCTAAGACTTGTTCTAAACCTTCAGAAAATGAAGTGCCACAGC 
AGGCCATTGACTCTCACTCAGTCAAGAATTTCAGAGAAGAACCTGAACATGATTTTAGCAAAA 
TTTCCATCGTGAGGCCATTTTCAATAGAAACGAAGGATTCCACGGATATCTCGGCAGTCCTCG 
GAACAAAAGCAGCTCATGGCTGTGTAACTGCAGTCTCAGGCAAGGCTCTGCCTTCCAGCCCGC 
CAGACGCCCTCCAGGACGAGGCGATGACAGAAGGCAGCATGGGGGTCACCCTCGAGGCCTCCG 
CAGAAGCTGATCTAAAAGCTGGCAACTCCTGTCCAGAGCTTGTGCCCAGCAGAAGAAGCAAGC 
TGAGAAAGCCCAAGCCTGTCCCCCTGAGGAAGAAAGCAATTGGAGGAGAGTTCTCAGACACCA 

ATGAGAACACAAGTCCTTTGCTAGGAGATGCCAGGTTCCAGAAGTCTCCCCCTGACATTAAAG 
AAACTCCCGGCACTCTCAGTAGTGACACCAACGACTCAGGGGTTGAGCTGGGGGAGGAGTCGA 
GGAGCTCACCTCTCAAGCTTGAGTTTGATTTCACAGAAGATACAGGAAACATAGAGGCCAGGA 
AAGCCCTTCCAAGGAAGCTTGGCAGGAAACTGGGTAGCACACTGACTCCCAAGATACAAAAAG 
ATGGCATCAGTAAGTCAGCAGGTTTAGAACAGCCTACAGACCCAGTGGCACGAGACGGGCCTC 
TCTCCCAAACATCTTCCAAGCCAGATCCTAGTCAGTGGGAGAGCCCCAGCTTCAACCCCTTTG 
GGAGCCACTCTGTTCTGCAGAACTCCCCACCCCTCTCTTCTGAGGGCTCCTACCACTTTGACC 
CAGATAACTTTGACGAATCCATGGATCCCTTTAAACCAACTACGACCTTAACAAGCAGTGACT 
TTTGTTCTCCCACTGGTAATCACGTTAATG AAATCTT AG AATCAC CCAAG AAGGC AAAGTCG C 
GTTTAATAACGAGTGGCTGTAAGGTGAAGAAGCATGAAACTCAGTCTCTCGCCCTGGATGCAT 
GTTCTCGGGATGAAGGGGCAGTGATCTCCCAGATTTCAGACATTTCTAATAGGGATGGCCATG 
CT ACT G ATG AGG AG AAACTGG C ATCC ACGTC ATG TGG TC AG AAAT C AG CTGGTG C CG AGGTG A 
AAGGTGAGCCAGAGGAAGACCTGGAGTACTTTGAATGTTCCAATGTTCCTGTGTCTACCATAA 
ATCATGCGTTTTCATCCTCAGAAGCAGGCATAGAGAAGGAGACGTGCCAGAAGATGGAAGAAG 
ACGGGTCCACTGTGCTTGGGCTGCTGGAGTCCTCTGCAGAGAAGGCCCCTGTGTCGGTGTCCT 
GTGGAGGTGAGAGCCCCCTGGATGGGATCTGCCTCAGCGAATCAGACAAGACAGCCGTGCTCA 
CCTTAATAAGAGAAGAGATAATTACTAAAGAGATTGAAGCAAATGAATGGAAGAAGAAATACG 
AAGAGACCCGGCAAGAAGTTTTGGAGATGAGGAAAATTGTAGCTGAATATGAAAAGACTATTG 
CTCAAATGATTGAAGATGAACAAAGGACAAGTATGACCTCTCAGAAGAGCTTCCAGCAACTGA 
CCATGGAGAAGGAACAGGCCCTGGCTGACCTTAACTCTGTGGAAAGGTCCCTTTCTGATCTCT 
TCAGGAGATATGAGAACCTGAAAGGTGTTCTGGAAGGGTTCAAGAAGAATGAAGAAGCCTTGA 
AG AAATGTGCTCAGG ATT ACTTAGCC AG AGTTAAAC AAG AGG AG C AG CG AT AC CAGGCCCTG A 
AAAT CC ACGC AG AAG AG AAACTG G AC AAG TAA G AG C TTG T AAATG TTG AATTT C ACT CTTC AT 
GATGTTGTGGGAAGATTGAGAGAGGAAAACAAAATCACTGTTTCGCAACTCCAGGTTGTATTT 



TTATGTGTGTGTTTATTTCACTTTTTAAACCCTTTTCCCATTGTTAAAAAAAAAAAAAAAAAA 



AAAAAAAAAAAAAACCCAAAAA 



ORF Start: ATG at 321 



jORF Stop: TAA at 2550 



SEQIDNO:6 



743 aa 



MWat 80863.4kD 



NOV2b, 

CGI 01 055-02 Protein 
Sequence 



MAFSPWQILSPVQWAKWTWSAVRGGAAGEDEAGGPEGDPEEEDSQAETKSLSFSSDSEGNFET 
PEAETPIRSPFKESCDPSLGLAGPGAKSQESQEADEQLVAEWEKCSSKTCSKPSENEVPQQA 
IDSHSVKNFREEPEHDFSKISIVRPFSIETKDSTDISAVLGTKAAHGCVTAVSGKALPSSPPD 
ALQDEAMTEGSMGVTLEASAEADLKAGNSCPELVPSRRSKLRKPKPVPLRKKAIGGEFSDTNA 
AVEGTPLPKASYHFSPEELDENTSPLLGDARFQKSPPDIKETPGTLSSDTNDSGVELGEESRS 
SPLKLEFDFTEDTGNIEARKALPRKLGRKLGSTLTPKIQKDGISKSAGLEQPTDPVARDGPLS 
QTSSKPDPSQWESPSFNPFGSHSVLQNSPPLSSEGSYHFDPDNFDESMDPFKPTTTLTSSDFC 
SPTGNHVNEILESPKKAKSRLITSGCKVKKHETQSLALDACSRDEGAVISQISDISNRDGHAT 
DEEKLASTSCGQKSAGAEVKGEPEEDLEYFECSNVPVSTINHAFSSSEAGIEKETCQKMEEDG 
STVLGLLESSAEKAPVSVSCGGESPLDGICLSESDKTAVLTLIREEIITKEIEANEWKKKYEE 
TRQEVLEMRKIVAEYEKTIAQMIEDEQRTSMTSQKSFQQLTMEKEQALADLNSVERSLSDLFR 
RYENLKGVLEGFKKNEEALKKCAQDYLARVKQEEQRYQALKIHAEEKLDK 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 2B. 
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Table 2B. Comparison of NOV2a against NOV2b. j 


Protein Sequence 


NOV2a Residues/ 
Match Residues 


Identities/ j 
Similarities for the Matched Region 


NOV2b 




I..310 
1..310 


269/310(86%) i 
270/310(86%) j 



Further analysis of the NOV2a protein yielded the following properties shown in 
Table 2C. 



Table 2C. Protein Sequence Properties NOV2a 


PSort 
analysis: 


0.8800 probability located in nucleus; 0.4612 probability located in mitochondrial 
matrix space; 0.3000 probability located in microbody (peroxisome); 0.1582 
probability located in mitochondrial inner membrane 


SignalP 
analysis: 


Cleavage site between residues 2 1 and 22 



o 

A search of the NOV2a protein against the Geneseq database, a proprietary database 



that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 2D. 



Table 2D. Geneseq Results for NOV2a 


Geneseq 
Identifier 


Protein/Organism/Length (Patent #, 
Date| 


NOV2a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM38680 


Human polypeptide SEQ ID NO 1 825 

- Homo sapiens, 1025 aa. 

[ WO200 1 533 1 2-A 1 , 26-JUL-200 1 ] 


1..295 
1..287 


79/318(24%) 
118/318(36%) 


2e-08 


AAM38679 


Human polypeptide SEQ ID NO 1824 

- Homo sapiens, 966 aa. 

[ WO200 1 533 1 2-A 1 , 26-JUL-200 1 ] 


I..295 
1..287 


79/318(24%) 
118/318(36%) 


2e-08 


AAM38678 


Human polypeptide SEQ ID NO 1823 

- Homo sapiens, 1013 aa. 

[ WO200 1 533 1 2-A 1 , 26-JUL-200 1 ] 


I..295 
1..287 


79/318(24%) 
118/318(36%) 


2e-08 


ABG22566 


Novel human diagnostic protein 
#22557 - Homo sapiens, 637 aa. 
[WO200I75067-A2, ll-OCT-2001] 


1..96 
56..I38 


33/96 (34%) 
49/96 (50%) 


le-07 


ABG16821 


Novel human diagnostic protein 
#1 68 1 2 - Homo sapiens, 1 24 aa. 
[WO2001 75067- A2, ll-OCT-2001] 


I..73 
56.1 1 5 


29/73 (39%) 
43/73 (58%) 


2e-07 
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In a BLAST search of public sequence datbases, the NOV2a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 2E. 



Table 2E. Public BLASTP Results for NOV2a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV2a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
V3lue 


0754 10 


Transforming acidic coiled-coil- 
containing protein 1 - Homo sapiens 
(Human), 805 aa. 


1.310 
1 ..310 


308/310(99%) 
309/310(99%) 


e-179 


Q9PTG8 

i 


CPEB-associated factor Maskin - 
Xenopus laevis (African clawed frog), 
931 aa. 


215..308 
260..362 


34/106 (32%) 
47/106(44%) 


0.015 
0.034 


Q8SY55 


GH09355p - Drosophila melanogaster 
(Fruit fly), 1514 aa. 


29..306 
875..1 136 


66/284 (23%) 
117/284 (40%) 


Q8T4F6 


SD08609p - Drosophila melanogaster 
(Fruit fly), 944 aa. 


34.. 144 
562..678 


32/123 (26%) 
56/123 (45%) 


0.044 


Q9VKQ6 


CG6729 protein - Drosophila 
melanogaster (Fruit fly), 944 aa. 


34.. 144 
562..678 


32/123 (26%) 
56/123 (45%) 


0.044 



5 PFam analysis predicts that the NOV2a protein contains the domains shown in the 

Table 2F. 



Table 2F. Domain Analysis of NOV2a 







Identities/ 




Pfam Domain 


NOV2a Match Region 


Similarities 

for the Matched Region 


Expect Value 



Example 3. 

10 The NOV3 clone was analyzed, and the nucleotide and encoded polypeptide 

sequences are shown in Table 3A. 



Tabic 3A. NOV3 Sequence Analysis 




SEQIDNO:7 478 bp 


NOV3a, 

CG 10 1973-01 DNA 
Sequence 

i 


CGGAAGAGGGGGTGAAGGCCAGAGGCTCGGGGCTTCAAGACCGCTGTCTGGAGTCCCCCTTTC 


CAGGCCATGTCGGGGCCCACCTGGCTGCCCCCGAAGCAGCCGGAGCCCGCCAGAGCCCCTCAG 
GGGAGGGCGATCCCCCGCGGCACCCCGGGGCCACCACCGGCCCACGGAGCAGGGGCTCCCTGC 
AGACAGGGGGGGCCTTCGCCCTGGAAGCCTGGACGCCGAGATAGACTTGCTGAGCAGCACGCT 
GGCCGAGCTGAATGGGGGTCGGGGTCATGCGTCACGGCGACCAGACCGACAGGCATATGAGCC 
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CCCGCCACCTCCTGCCTACCGCACGGGCTCCCTGAAGCCAAATCCAGCCTCGCCGCTCCCAGC 


GTCTCCCTATGGGGGCCCCACTCCAGCCTCTTACACTACCGCCAGCACCCCGGCTGGCCCAGC 


CTTCCCCGTGCAAGTGAAGGTGGCACAGCCAGTGAGG 




ORF Start: ATG at 70 ORF Stop: TG A at 3 1 0 




SEQIDNO:8 j80 aa 


MW at 8277.3kD 


NOV3a, 

CGI 01973-01 Protein 
Sequence 


MSGPTWLPPKQPEPARAPQGRAIPRGTPGPPPAHGAGAPCRQGGPSPWKPGRRDRLAEQHAGR 
AEWGSGSCVTATRPTGI 




SEQlDNO:9 1673 bp 


NOV3b, 

CG 10 1973-02 DNA 
Sequence 


CGGAAGAGGGGGTGAAGGCCAGAGGCTCGGGGCTTCAAGACCGCTGTCTGGAGTCCCCCTTTC 


CAGGCCATGTCGGGGCCCACCTGGCTGCCCCCGAAGCAGCCGGAGCCCGCCAGAGCCCCTCAG 
GGGAGGGCGATCCCCCGCGGCACCCCGGGGCCACCACCGGCCCACGGAGCAGCACTCCAGCCC 
CACCCCAGGGTCAATTTTTGCCCCCTTCCATCTGAGCAGTGTTACCAGGCCCCAGGGGGACCG 
GAGGATCGGGGGCCGGCGTGGGTGGGGTCCCATGGAGTACTCCAGCACACGCAGGGGCTCCCT 

GP AG AC AGGGGGf^HP PTTPP. C CC T(ZT1 & Zi CCCTnC ZV ffCCChC A TTiC arTTPPTr 1 n r- , /-*7v r*r+ a 

CTGGCCGAGCTGAATGGGGGTPnrjnnTPaTnpnTP 2i prr rr 1 A rr a r* a rrp arsrpnnTA t/-^ a r< 

CCCCCGCCACCTCCTGCCTACCGCACGGGCTCCCTGAAGCCAAATCCAGCCTCGCCGCTCCCA 

GCGTCTCCCTATGGGGGCCCCACTCCAGCCTCTTACACTACCGCCAGCACCCCGGCTGGCCCA i 

GCCTTCCCCGTGCAAGTGAAGGTGGCACAGCCAGTGAGGGGCTGCGGCCCACCCAGGCGGGGAj 

GCCTCTCAGGCCTCTGGGCCCCTCCCGGGCCCCCACTTTCCTCTCCCAGGCCGAGGTGAAGTC i 

TGGGGGCCTGGCTATAGGAGCCAGAGAGAGCCAGGGCCAGGGGCCAAAGAGGAAGCTGCTGGGI 

GTCTCTGGCC CTG C AGG AAG AGG AAG AGG AGGCG AGC ACGGG CCCC AGGTGC CC C TG AG C C AG 

CCTCCAGAGGATGAGCTGGATAGGCTGACGAAGAAGCTGGTTCACGACATGAACCACCCGCCC 

AGCGGGGAGTACTTTGGCCAGTGTGGTGGCTGCGGAGAAGATGTGGTTGGGGATGGGGCTGGG 

GTTGTGGCCCTTGATCGCGTCTTTCACGTGGGCTGCTTTGTATGTTCTACATGCCGGGCCCAG 

CTTCGCGGCCAGCATTTCTACGCCGTGGAGAGGAGGGCATATTGCGAGGGCTGCTACGTGGCC 

ACCCTGGAGAAATGTGCCACGTGCTCCCAGCCCATCCTGGACCGGATCCTGCGGGCTATGGGG 

AAGGCCTACCACCCTGGCTGCTTCACCTGCGTGGTGTGTCACCGCGGCCTCGACGGCATCCCC 

TTCACAGTGGATGCTACGAGCCAGATCCACTGCATTGAGGACTTTCACAGGAAGTTTGCCCCA 

AGATGCTCAGTGTGCGGTGGGGCCATAATGCCTGAGCCAGGTCAGGAGGAGACTGTGAGAATT 

GTTGCTCTGGATCGAAGTTTTCACATTGGCTGTTACAAGTGCGAGGAGTGTGGGCTGCTGCTC 

TCCTCTGAGGGCGAGTGTCAGGGCTGCTACCCGCTGGATGGGCACATCTTGTGCAAGGCCTGC 

AGCGCCTGGCGCATCCAGGAGCTCTCAGCCACCGTCACCACTGACTGCTGAGTCTTCCTAGAA 

GTACCTGCTGGGTTCTCAGTTCCAGTTCCCATCCTTTGATTGATCACTCTCCCTGACATCCAC 


CTGTATGACTTTGTCACCAAATGCTGTCTTCTCTTTCTCCAATCAAGAAATAATAATCCCTCG 


AG TTT AC AAAAC AAAAAAAAAAAAAAAAAAAAAAA 




ORF Start: ATG at 70 j ]ORF slop: TGA at 1 498 




SEQ1DNO:10 476 aa |MW at 50287.4kD j 


NOV3b, 

CGI 01973-02 Protein 
Sequence 


MSGPTWLPPKQPEPARAPQGRAIPRGTPGPPPAHGAALQPHPRVNFCPLPSEQCYQAPGGPED 
RGPAWVGSHGVLQHTQGLPADRGGLRPGSLDAEIDLLSSTLAELNGGRGHASRRPDRQAYEPP 
PPPAYRTGSLKPNPASPLPASPYGGPTPASYTTASTPAGPAFPVQVKVAQPVRGCGPPRRGAS 
QASGPLPGPHFPLPGRGEVWGPGYRSQREPGPGAKEEAAGVSGPAGRGRGGEHGPQVPLSQPP 
EDELDRLTKKLVHDMNHPPSGEYFGQCGGCGEDWGDGAGWALDRVFHVGCFVCSTCRAQLR 
GQHFYAVERRAYCEGCYVATLE KCATCSQPI LDRI LRAMGKAYHPGCFTCWCHRGLDG I P FT 
VDATSQIHCIEDFHRKFAPRCSVCGGAIMPEPGQEETVRIVALDRSFHIGCYKCEECGLLLSS 
EGECQGCYPLDGHILCKACSAWRIQELSATVTTDC 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 3B. 



Tabic 3B. Comparison of NOV3a against NOV3b. 


Protein Sequence 


NOV3a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV3b 


1 .-19 
1-19 


19/19(100%) 
19/19(100%) 
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Further analysis of the NOV3a protein yielded the following properties shown in 
Table 3C. 



Table 3C. Protein Sequence Properties NOV3a 


PSort 
analysis: 


0.8486 probability located in lysosome (lumen); 0.4500 probability located in 
cytoplasm; 0.2583 probability located in microbody (peroxisome); 0.1000 
probability located in mitochondrial matrix space 

__ — j 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



5 

A search of the NOV3a protein against the Geneseq database, a proprietary database 



that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 3D. 



Table 3D. Geneseq Results for NOV3a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV3a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM52307 


Human TRIP6 - Homo sapiens. 476 aa. 
[ WO200 1 7 1 356-A2. 27-SEP-200 1 ] 


I..36 
1..36 


36/36(100%) 
36/36(100%) 


7e-17 


AAM52308 


Murine TRIP6 - Mus musculus, 480 aa. 
[WO200 1 7 1 356-A2, 27-SEP-200 1 ] 


1..36 
I..36 


29/36 (80%) 
32/36 (88%) 


le-12 


AAR79480 : 


Rat type II collagen - Rattus sp, 1442 
aa. [ W095226 1 1 -A2, 24-A UG- 1 995] 


I..79 

952.. 1029 


29/80 (36%) 
34/80 (42%) 


0.01 1 


AAE16477 ; 


Human collagen alpha! (II) protein - 
Homo sapiens, 1 4 1 8 aa. [US63233 1 4- 
B1.27-NOV-200I] 


I..79 

928.. 1005 


28/80 (35%) 
34/80 (42%) 


0.024 

i 


ABB09627 

L_ ; 


Amino acid sequence of human 
collagen type II alpha 1 - Homo sapiens, 
1418 aa. [US634236I-BI, 29-JAN- 
2002] 


I..79 

928.. 1005 

t 


28/80 (35%) 
34/80 (42%) 


0.024 



10 

In a BLAST search of public sequence datbases, the NOV3a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 3E. 
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Table 3E. Public BLASTP Results for NOV3a 



; Protein 
j Accession 
Number 


Protein/Organ ism/Length 


NOV3a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


CAC94501 


Sequence 15 from Patent WO0171356 - 
Homo sapiens (Human), 476 aa. 


1..36 
I..36 


36/36(100%) 
36/36 (100%) 


2e-l6 


Q9BUE5 


Similar to thyroid hormone receptor 
interactor 6 - Homo sapiens (Human), 
474 aa. 


I..36 
1..36 


36/36(100%) 
36/36 (100%) 


2e-16 


Q9BXP3 


Thyroid receptor interacting protein 6 - 
Homo sapiens (Human), 476 aa. 


1..36 
1..36 


36/36(100%) 
36/36 (100%) 


2e-l6 


Q15654 


Thyroid receptor interacting protein 6 
(TRIP6) (OPA-interacting protein 1) 
(Zyxin related protein 1) (ZRP-I) - 
Homo sapiens (Human), 476 aa. 


1..36 
1..36 


36/36(100%) 
36/36(100%) 


2e-16 


Q9Z1Y4 


Zyxin related protein- 1 (Thyroid 
hormone receptor interactor 6) (TRIP6) - 
Mus musculus (Mouse), 480 aa. 


I..36 
1..36 


29/36 (80%) 
32/36 (88%) 


3e-l2 



PFam analysis predicts that the NOV3a protein contains the domains shown in the 
Table 3F. 



Table 3F. Domain Analysis of NOV3a 



Pfam Domain 

i 


NOV3a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


1 



Example 4. 

The NOV4 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 4A. 



•Table 4A. NOV4 Sequence Analysis 


1 


SEQ ID NO: 11 ]423 bp 


|NOV4a, 

CGI 02244-01 DNA 
Sequence 


GGGTCCTGGGGCCCAGCGCGGGTGGCCGCCGCGGCCCCTCGGGCTGCGTGGGGAGGGGGCTTC 


CGCCCCTGTTGTCATTGCTCCTGCAGCCTTTTCGCTGGGACTGCGCGACACCGCCCCCCGACC 


GGGTGCCCGCTGTGTGCCAGGCCGGGTGCTGGGCACGGTCCCGCGAGTGCCCTATAAGGACTG 


CC AGGCAAT AATG AAGGTTCTT T T ACTG AAGG ATG CG AAGG AAG ATG AC TGTGGC C AGG AT C C 
GTATATCAGGGAATTAGGATTATATGGACTTGAAGCCACTTTGATCCCTGTTTTATCGTTTGA 
GTTTTTGTCTCTTCCCAGTTTCTCTGAGAAGTCTGGGAAAGGTCTCTGAAAGAAAAATGGAAT 
GCC AAGT C AGTGT ATGTG GTTGG AAATG C T ACTG CTTCT CT AGTG 




ORF Start: ATG at 200 j |ORF Stop: TGA at 362 
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\ 



t '■■ 

t 

I 


SEQ ID NO: 12 


54 aa jMW at 5975.9kD 


;NOV4a, 
jCG 102244-01 
jProtein Sequence 


MKVLLLKDAKEDDCGQDPYIRELGLYGLEATLIPVLSFEFLSLPSFSEKSGKGL 


! 

i 
J 


SEQ ID NO: 13 j 


1086 bp 




[NOV4b, 

ICG 102244-02 DNA 
Sequence 

! 

! 


CCACAGAGGGCAGTCACGTGCCCGCTGTGTGCCAGGCCGGGTGCTGGGCACGGTCCCGCGAGT 


GCCCTATAAGGACTGCCAGGCAATAATGAAGGTTCTTTTACTGAAGGATGCGAAGGAAGATGA 
CTGTGGCCAGGATCCGTATATCAGGGAATTAGGATTATATGGACTTGAAGCCACTTTGATCCC 
TGTTTTATCGTTTGAGTTTTTGTCTCTTCCCAGTTTCTCTGAGAAGCTTTCTCATCCTGAAGA 
TTACGGGGGACTCATTTTTACCAGCCCCAGAGCAGTGGAAGCAGCAGAGTTATGTTTGGAGCA 
AAACAATAAAACTGAAGTCTGGGAAAGGTCTCTGAAAGAAAAATGGAATGCCAAGTCAGTGTA 
TGTGGTTGGAAATGCTACTGCTTCTCTAGTGAGTAAAATTGGCCTGAATACAGAAGGAGAAAC 
CTGTGG AAATGC AG AAAAGCTTG C AG AAT AT ATTTGTTC C AGGG AGT CCT C AGC ACTG CCT CT 
TCTATTTCCCTGTGGAAACCTCAAAAGAGAAATCCTGCCAAAAGCGCTCAAGGACAAAGGGAT 
TGCCATGGAAAGCATAACTGTGTATCAGACAGTTGCACACCCAGGAATCCAAGGGAACCTGAA 
CAGCTACTATTCCCAGCAGGGGGTTCCAGCCAGCATCACATTTTTTAGTCCCTCTGGCCTCAC 
ATACAGTCTCAAGCACATTCAGGAGTTATCTGGTGACAATATCGATCAAATTAAGTTTGCAGC 
CATCGGCCCCACTACGGCTCGCGCGCTGGCCGCCCAGGGCCTTCCTGTAAGCTGCACTGCAGA 
GAGCCCCACGCCACAAGCCCTGGCCACTGGCATCAGGAAGGCTCTCCAGCCCCATGGCTGCTG 
CTGAGTCAGCCACCTAGCGCTGGCCCCATGCAGCCTCCCTGGGCTGGGCTGGCTCTGGATGGA 


i 
i 
1 


GCCAGGCATCGGCAAGGGCTCTCGGGAGCTGCTGCCGTCAGACTCCTGCCTCAAGCCTGAGTG 


GAAGCACCTGAGGACCGGGGATCGGGACCTGACCTGGGGCTGGCCTCAGGCCCACGTGCACGT 


GACTGCCCTCTGTGG 




ORF Start: ATG at 89 I 


ORF Stop: TGA at 884 


! 


SEQ ID NO: 14 }265 aa |MW at 28626.3kD 


} NOV4b ? 

CGI 02244-02 Protein 
Sequence 

i 

? . , 


MKVLLLKDAKEDDCGQDPYIRELGLYGLEATLIPVLSFEFLSLPSFSEKLSHPEDYGGLIFTS 
PRAVEAAELCLEQNNKTEVWERSLKEKWNAKSVYWGNATASLVSKIGLNTEGETCGNAEKLA 
EYICSRESSALPLLFPCGNLKREILPKALKDKGIAMESITVYQTVAHPGIQGNLNSYYSQQGV 
PASITFFSPSGLTYSLKHIQELSGDNIDQIKFAAIGPTTARALAAQGLPVSCTAESPTPQALA 
TGI RKALQPHGCC 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 4B. 



j Tabic 4B. Comparison of NOV4a against NOV4b. 


I 

j Protein Sequence 


NOV4a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


j NOV4b 

! 
1 


1.35 
1.35 


35/35 (100%) 
35/35 (100%) 



5 

Further analysis of the NOV4a protein yielded the following properties shown in 
Table 4C. 



Tabic 4C. Protein Sequence Properties NOV4a 


PSort 
analysis: 


0.6500 probability located in cytoplasm; 0.1000 probability located in mitochondrial 
matrix space; 0.1000 probability located in lysosome (lumen); 0.0480 probability 
located in microbody (peroxisome) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 
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A search of the NOV4a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 4D. 

5 



Table 4D. Geneseq Results for NOV4a 



Geneseq 
Identifier 


Pro tcin/Orga n ism/Length 
[Patent**, Date] 


NOV4a 
Residues/ 
Match 
Residues 


1 

Identities/ | r 
Similarities for the i v X P eC 
Matched Region | aluC 





In a BLAST search of public sequence datbases, the NOV4a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 4E, 



j Table 4E. Public BLASTP Results for NOV4a 








Protein 
Accession 
i Number 


Pro te i n/O rga n is m/Len gt h 


NOV4a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


i 

! 

j Expect 
j Value 

! 


| PI 0746 

i 

i 
i 

i 


Uroporphyrinogen-III synthase (EC 
4.2.1.75) (UROS) (Uroporphyrinogen- III 
cosynthetase) (Hydroxymethylbilane 
hydrolyase [cyclizing]) (UROIHS) - Homo 
sapiens (Human), 265 aa. 


1..49 
1..49 


49/49(100%) 
49/49(100%) 


1 2e-2 1 

! 

■ 


P5U63 


Uroporphyrinogen-III synthase (EC 
4.2. 1 .75) (UROS) (Uroporphyrinogen- III 
cosynthetase) (Hydroxymethylbilane 
hydrolyase [cyclizing]) (UROIIIS) - Mus 
musculus (Mouse), 265 aa. 


I. .49 
I..49 


42/49 (85%) 
44/49 (89%) 

_.. 


4e-l6 


Q9CW78 


Uroporphyrinogen III synthase - Mus 
musculus (Mouse), 27 aa (fragment). 


1..27 
1..27 


22/27(81%) 1 9e-05 
23/27 (84%) j 


Q9JLU5 

i 
1 

1 


Uroporphyrinogen III synthase (EC 
4.2.1.75) - Mus musculus (Mouse), 21 aa 
(fragment). 


1 ..21 
I..21 


18/21 (85%) 
19/21 (89%) 


0.005 



io ' 

PFam analysis predicts that the NOV4a protein contains the domains shown in the 
Table 4F. 
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Table 4F. Domain Analysis of NOV4a 






Pfam Domain 


NOV4a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 





Example 5. 

The NOV5 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 5A. 

5 



[Table 5A. NOV5 Sequence Analysis 


j jSEQ ID NO: 15 


1073 bp j 


NOV5a, 

CGI 027 13-01 DNA 
Sequence 


GACACCCCTCTCTGTGACTCAGTCTCTGAGCGTTTTAATACGATGGTGTCCCCGCGGGATCAA 


jACTTCAGCGTCACAGCTGAGGACTGGCTTCGTGGTCCCTGATGGGAGAGCATGAACAGGTGGT 


ATGTGAAGCCCTTGGAGACCAGCTCTTCCAAAGTCAAAGCCAAGACCATTGTGATGATTCTCG 
jACTCCCAGAAGCTCCTGCGATGTGAACTTGAGTCACTCAAGAGCCAGTTACAGGCCCAGACCA 
jAGGCTTTCGAGTTCCTGAACCACTCAGTGACCATGTTGGAGAAGGAGAGCTGCTTGCAGCAAA 
TCAAGATTCAGCAGCTTGAAGAGGTGCTGAGCCCCACAGGCCGCCAGGGAGAGAAGGAGGAGC 
ACAAGTGGGGCATGGAGCAGGGCCGGCAGGAGCTGTATGGGGCCCTGACCCAAGGCCTTCAGG 
GGCTGGAGAAGACCCTGCGTGACAGTGAGGAGATGCAGCGGGCCCGCACCACTCGCTGCCTGC 
AGCTGCTGGCCCAGGAGATCCGGGACAGCAAGAAGTTCCTGTGGGAGGAGCTGGAACTGGTGC 
GGGAGGAGGTGACCTTCATCTATCAGAAGCTCCAAGCGCAGGAGGATGAGATCTCAGAGAACT 
TGGTG AACATT CAGAAAATG CAG AAAACG C AGGTG AAATG CCG C AAAATC CTG AC C AAG AT G A 
AGCAGCAGGGTCATGAGACAGCCGCCTGTCCGGAGACTGAAGAGATACCGCAGGGAGCCAGTG 
GCTGCTGGAAGGATGACCTCCAGAAGGAACTGAGTGATATATGGTCTGCTGTGCACGTGCTGC 
AGAACTCCATAGACAGCCTCACTTTGTGCTCGGGGGCCTGTCCCAAGGCCTCGAGCCTAAGAG 
GCCACAAGGGGCACCAGTGCCTGAGCCCTCCACTCCCCTCCTGGGACTCTGACTCCGACTCTG 
ACCAGGACCTCTCCCAGCCACCTTTCAGCAAGAGTGGCCGCTCCTTCCCACCCGCTTGAGCAG 
CCGGGACTGCTCTCCCTGAAGACCCCTCCAGAGAGAAAATAAACTAGCCCAGACCCTCCTCTA 


|AA 


jORF Start: ATG at 1 14 jORF Stop: TGA at 1002 




SEQIDNO:16 296 aa |MW at 33734. IkD 


NOV5a, 

CGI02713-01 Protein 
Sequence 


MNRWYVKPLETSSSKVKAKTIVMILDSQKLLRCELESLKSQLQAQTKAFEFLNHSVTMLEKES 
CLQQIKIQQLEEVLSPTGRQGEKEEHKWGMEQGRQELYGALTQGLQGLEKTLRDSEEMQRART 
TRCLQLLAQEIRDSKKFLWEELELVREEVTFIYQKLQAQEDEISENLVNIQKMQKTQVKCRKI 
LTKMKQQGHETAACPETEEIPQGASGCWKDDLQKELSDIWSAVHVLQNSIDSLTLCSGACPKA 
SSLRGHKGHQCLSPPLPSWDSDSDSDQDLSQPPFSKSGRSFPPA 




SEQIDNO: 17 


1382 bp 


NOV5b, 

CGI 027 13-02 DNA ; 
'Sequence 

i 

j 

! 


TT CAG CGTC AC AG CTG AG G AC TGGCTTCGTGGT C C CTG ATGG G AG AG C ATG AAC AGG A C CCTC 


TTTTGGCCGGCCTACCCCGGGACCCTGACTACTCTGTGTCCTGCCTCTACTCACCTCCCTCAA 
GGAAGCCCTCACCCCTAGGCTCTTCCTCGGACAAGGCTCTGGAGCGTACAGCTCACTGGTCCA 
GGACTCCAGAGCCAGAGACCTTGGGATGCCCTGCTTCTGGGGACACAGTGAGGACTGCAGACT 
GCAGGCCAGGGTGGGGCTCAGGGCCTTCGCCACATGAGGCTGCCCCCTCCCCCAGTCCAGACC 
TGCAGAAGCAGTGCTGTAATGACCAGGACATTTTGAAGAGGCATCACAACGTAGCTAAGGTCA 
CCCCCTCTTCTCTGCCCACAGCCAAGACCATTGTGATGATTCCCGACTCCCAGAAGCTCCTGC 
GATGTGAACTTGAGTCACTCAAGAGCCAGTTACAGGCCCAGACCAAGGCTTTCGAGTTCCTGA 
ACCACTCAGTGACCATGTTGGAGAAGGAGAGCTGCTTGCAGCAAATCAAGATTCAGCAGCTTG 
AAGAGGTGCTGAGCCCCACAGGCCGCCAGGGAGAGAAGGAGGAGCACAAGTGGGGCATGGAGC 
AGGGCCGGCAGGAGCTGTATGGGGCCCTGACCCAAGGCCTTCAGGGGCTGGAGAAGACCCTGC 
GTGACAGTGAGGAGATGCAGCGGGCCCGCACCACTCGCTGCCTGCAGCTGCTGGCCCAGGAGA 
TCCGGGACAGCAAGAAGTTCCTGTGGGAGGAGCTGGAACTGGTGCGGGAGGAGGTGACCTTCA 
TCTATCAGAAGCTCCGTGAGCAGGAGGATGAGATCTCAGAGAACTTGGTGAACATTCAGAAAA 
TG CAG AAAACGC AGG TG AAAT GCCGC AAAGTGCTG AC CAAG ATG AAG CAG C AGGG T C ATG A G A 
C AGCCG C CTG TCCGG AG ACTG AAG AG AT ACCGCAGGG AG CCAGTGGCTG CTGG AAG G ATG A C C 
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i 

I 
i 

i 
t 

s 

i 


TCCAGAAGGAACTGAGTGATATATGGTCTGCTGTGCACGTGCTGCAGAACTCCATAGACAGCC 
TCACTTTGTGCTCGGGGGCCTGTCCCAAGGCCTCGAGCCTAAGAGGCCACAAGGGGCACCAGT 
GCCTGAGCCCTCCACTCCCCTCCTGGGACTCTGACTCCGACTGTGACCAGGACCTCTCCCAGC 
CACCTTTCAGCAAGAGCGGCCGCTCCTTCCCACCCGGTGCAGATCCTCCCCAGTCCCCCCCTC 
CACCCATTTCCCTCCTGACCTGCCCTTGACTCCCACAGTCTAGACCCTCAGAACAGGCCCAGA 


TCCTGATCTGGGCATGCGGTCCCTGACCTGCAGCCCCGGAGTCCCCTGGACCTGGCCAG 


! 
1 


ORF Start: ATG at 39 j 


ORF Stop: TGA at 1287 




SEQIDNO:18 j416aa |MW at 46300.7kD 


NOV5b, 

CGI 027 13-02 Protein 
Sequence 


MGEHEQDPLLAGLPRDPDYSVSCLYSPPSRKPSPLGSSSDKALERTAHWSRTPEPETLGCPAS 
GDTVRTADCRPGWGSGPSPHEAAPSPSPDLQKQCCNDQDILKRHHNVAKVTPSSLPTAKTIVM 
IPDSQKLLRCELESLKSQLQAQTKAFEFLNHSVTMLEKESCLQQIKIQQLEEVLS PTGRQGEK 
EEHKWGMEQGRQELYGALTQGLQGLEKTLRDSEEMQRARTTRCLQLLAQEIRDSKKFLWEELE 
LVREEVTFIYQKLREQEDEISENLVNIQKMQKTQVKCRKVLTKMKQQGHETAACPETEEIPQG 
ASGCWKDDLQKELSDIWSAVHVLQNSIDSLTLCSGACPKASSLRGHKGHQCLSPPLPSWDSDS 
DCDQDLSQPPFSKSGRSFPPGADPPQSPPPPISLLTCP 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 5B. 



Table 5B. Comparison of NOV5a against NOV5b. 


Protein Sequence 


NOV5a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV5b 


11. .293 
1I4..396 


245/283 (86%) 
247/283 (86%) 



5 

Further analysis of the NOV5a protein yielded the following properties shown in 
Table 5C. 



■ Tabic 5C. Protein Sequence Properties NOVSa 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3600 probability located in mitochondrial 
matrix space; 0.1000 probability located in lysosome (lumen); 0.0000 probability 
located in endoplasmic reticulum (membrane) 


SignalP 
^analysis: 


No Known Signal Sequence Predicted 



10 A search of the NOV5a protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 5D. 



[— 

Table 5D. Geneseq Results for NOVSa 








Geneseq 
Identifier 


Protcin/Organism/Length [Patent #, 
Date] 


NOVSa 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 
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AAY60569 


Human normal bladder tissue EST 
encoded protein 241 - Homo sapiens, 
227 aa. [DE19818620-AI, 28-OCT- 
1999] 


1..21 1 
4..2I4 


210/211 (99%) 
210/211 (99%) 


e-116 


AAG03416 


Human secreted protein, SEQ ID NO: 
7497 - Homo sapiens, 87 aa. 
[EP1033401-A2, 06-SEP-2000] 


6..86 
7..87 


79/81 (97%) 
79/81 (97%) 


2e-36 


AAM35994 


Peptide #1003 1 encoded by probe for 
measuring placental gene expression - 
Homo sapiens, 40 aa. [WO200 157272- 
A2,09-AUG-2001] 


I89..228 
I..40 


40/40(100%) 
40/40(100%) 


4e-17 


AAM20726 


Peptide #7160 encoded by probe for 
measuring cervical gene expression - 
Homo sapiens, 40 aa. [WO200 157278- 
A2, 09-AUG-2001] 


189..228 
1..40 


40/40(100%) 
40/40(100%) 


4e-17 
I « 


AAM75883 

j 

1 
i 


Human bone marrow expressed probe 
encoded protein SEQ ID NO: 36189 - 
Homo sapiens, 40 aa. [WO2001 57276- 
A2, 09-AUG-200I] 


189..228 
1..40 


40/40(100%) 
40/40(100%) 


4e-17 


In a BLAST search of public sequence datbases, the NOV5a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 5E. 


Table 5E. Public BLASTP Results for NOV5a 


i 

| Protein 
Accession 
Number 


Protcin/Organism/Length 


NOV5a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9CXZ5 


2510048L02Rik protein - Mus musculus 
(Mouse), 403 aa. 


7..296 
125..403 


220/290 (75%) 
242/290 (82%) 


e-I 17 


Q9D4W5 


2510048L02Rik protein - Mus musculus 
(Mouse), 267 aa. 


7..139 
133..262 


106/133(79%) 
116/133 (86%) 


le-51 


P30427 


Plectin 1 (PLTN) (PCN) - Rattus 
norvegicus (Rat), 4687 aa. 


6..21I 
2592..2783 


55/210(26%) 
101/210(47%) 


2e-07 


Q9CW93 


0610037D15Rik protein - Mus 
musculus (Mouse), 322 aa (fragment). 


34..180 
81..223 


47/150 (31%) 
74/150(49%) 


3e-06 


Q9JI55 


Plectin 1 (PLTN) (PCN) (300-kDa 
intermediate filament-associated 
protein) (IFAP300) - Cricetulus griseus 
(Chinese hamster), 4473 aa (fragment). 


6..213 
2378. .2571 


54/212(25%) 
100/212(46%) 


7e-06 



5 



PFam analysis predicts that the NOV5a protein contains the domains shown in the 
Table 5F. 
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Tabic 5F. Domain Analysis of NOV5a 



f 

Pfam Domain 

• 

! 


NOV5a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 





Example 6. 

The NOV6 clone was analyzed, and the nucleotide and encoded polypeptide 
5 sequences are shown in Table 6A. 



iTable 6A. NOV6 Sequence Analysis 






j SEQ1DN0:19 |556 bp 


1 " 


;NOV6a, 

:CG 102975-01 DNA 
Sequence 


TGCAATATAGGGGAAAAGCAGACCATGGTGAATCCGGGCAGCAGCTCGCAGCCGCCCCCGGTG 
ACGGCCGGCCCCCTCTCCTGGAAGCGGTGCGCAGGCTGCGGGGGCAAGATTGCGGACCGCTTT 
CTGCTCTATGCCATGGACAGCTATTGGCACAGCCGGTGCCTCAAGTGCTCCTGCTGCCAGGCG 
CAGCTGGGCGACATCGGCACGTCCTGTTACACCAAAAGTGGCATGATCCTTTGCAGAAATGAC 
T AC ATT AGGTT ATTTGG AAAT AG CGGTG C T TG C AG C G CTTG CGG AC AGTC G AT TC C TG CG AG T 
GAACTCGTCATGAGGGCGCAAGGCAATGTGTATCATCTTAAGTGTTTTACATGCTCTACCTGC 
CGGAATCGCCTGGTCCCGGGAGATCGGTTTCACTACATCAATGGCCATTTGAATTCACTTCAG 
AGCAATCCACTACTGCCAGACCAGAAGGTCTGCTAAAAGGTCAGAGTAATGCAGAATGCGTGC 
CTTCATCTCAGATTTGTTCATCACAGGTGGATCCCATGTGTCTTCAGTAGAC 




ORF Start: ATG at 25 


ORF Stop: TAA at 475 




SEQIDNO:20 150aa 


MWat 16348.6kD 


:NOV6a, 

:CG 102975-01 Protein 
:Sequence 


MVNPGSSSQPPPVTAGPLSWKRCAGCGGKIADRFLLYAMDSYWHSRCLKCSCCQAQLGDIGTS 
CYTKSGMILCRNDYIRLFGNSGACSACGQS I PASELVMRAQGNVYHLKCFTCSTCRNRLVPGD 
RFHYINGHLNSLQSNPLLPDQKVC 




SEQIDNO:2l 619 bp 




;NOV6b, 

jCG 102975-02 DNA 
Sequence 

i 


TTGCAGATTCGCCCTTATTGCAATATAGGGGAAAAGCAGACCATGGTGAATCCGGGCAGCAGC 


TCGCAGCCGCCCCCGGTGACGGCCGGCTCCCTCTCCTGGAAGCGGTGCGCAGGCTGCGGGGGC 
AAGATTGCGGACCGCTTTCTGCTCTATGCCATGGACAGCTATTGGCACAGCCGGTGCCTCAAG 
TGCTCCTGCTGCCAGGCGCAGCTGGGCGACATCGGCACGTCCTGTTACACCAAAAGTGGCATG 
ATCCTTTGCAGAAATGACTACATTAGGTTATTTGGAAATAGCGGTGCTTGCAGCGCTTGCGGA 
CAGTCGATTCCTGCGAGTGAACTCGTCATGAGGGCGCAAGGCAATGTGTATCATCTTAAGTGT 
TTTACATGCTCTACCTGCCGGAATCGCCTGGTCCCGGGAGATCGGTTTCACTACATCAATGGC 
AGTTTATTTTGTGAACATGATAGACCTACAGCTCTCATCAATGGCCATTTGAATTCACTTCAG 
AGCAATCCACTACTGCCAGACCAGAAGGTCTGCTAAAAGGTCAGAGTAATGCAGAATGCGTGC 
CTTCATCTCAGATTTGTTCATCACAGGTGGATCCCATGTGTCTTCAGTAGAC 


» 


ORF Start: ATG at 43 j j 


ORF Stop: TAA at 538 




SEQIDNO:22 ]l65aa jMWat 17993.4kD 


,NOV6b, 

CGI 02975-02 Protein 
Sequence 


MVNPGSSSQPPPVTAGSLSWKRCAGCGGKIADRFLLYAMDSYWHSRCLKCSCCQAQLGDIGTS 
CYTKSGMILCRNDYIRLFGNSGACSACGQSIPASELVMRAQGNVYHLKCFTCSTCRNRLVPGD 
RFHYINGSLFCEHDRPTALINGHLNSLQSNPLLPDQKVC 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 6B. 
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| Table 6B. Comparison of NOV6a against NOV6b. 


j Protein Sequence 


NOV6a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


! NOV6b 

! 


I. .150 
1..165 


149/165 (90%) 
149/165 (90%) 



Further analysis of the NOV6a protein yielded the following properties shown in 
Table 6C. 



I "-' - ■■ ■ - 

j Table 6C. Protein Sequence Properties NOV6a 


| PSort 
\ analysis: 

\ 


0.6500 probability located in cytoplasm; 0.1000 probability located in mitochondrial 
matrix space; 0.1000 probability located in lysosome (lumen); 0.0000 probability 
located in endoplasmic reticulum (membrane) 


j SignalP 
| analysis: 


No Known Signal Sequence Predicted 



5 



A search of the NOV6a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 6D. 

( 1 —* ■ — i 



; Table 6D. Geneseq Results for NOV6a 



Geneseq 
Identifier 


| 

Protcin/Organism/Length [Patent #, 
Date] 


NOV6a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAW61544 


Human rhombotin-like protein - Homo 
sapiens, 165 aa. [W09829546-A1, 09- 
JUL-1998] 


I. .150 
1 ..165 


148/165 (89%) 
148/165 (89%) 


5e-86 


AAO0846I 


Human polypeptide SEQ ID NO 22353 
- Homo sapiens, 1 12 aa. 
[WO200164835-A2, 07-SEP-2001] 


54.. 150 
I ..112 


95/112 (84%) 
95/1 12 (84%) 


7e-50 


AAW78226 

i 

1 

i 


Fragment of human secreted protein 
encoded by gene 1 - Homo sapiens, 
266 aa. [WO9856804-A1, 17-DEC- 
1998] 


58..150 
115..222 


93/108 (86%) 
93/108 (86%) 


3e-49 


AAM41406 


Human polypeptide SEQ ID NO 6337 - 

Homo sapiens, 149 aa. 

[ WO200 1 533 1 2- A 1 , 26-JUL-200 1 ] 


21..132 
15..126 


64/112(57%) 
81/112(72%) 


3e-37 


AAM39620 ! 


Human polypeptide SEQ ID NO 2765 - 

Homo sapiens, 145 aa. 

[ WO200 1 533 1 2- A 1 , 26-JUL-200 1 ] 


21..132 
11..122 


64/1 12 (57%) 
81/112(72%) 


3e-37 
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In a BLAST search of public sequence datbases, the NOV6a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 6E. 



; Table 6E. Public BLASTP Results for NOV6a 


j Protein 
: Accession 
j Number 

i 
t 


Protein/Organism/Length 


NOV6a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


000158 


LIM domain transcription factor LM04 
(LIM-only protein 4) (LMO-4) (Breast 
tumor autoantigen) - Homo sapiens 
(Human), and, 165 aa. 


1 ..150 
1..165 


149/165 (90%) 
149/165 (90%) 


3e-86 


' Q8QG63 

i 


LIM only 4 protein - Brachydanio rerio 
(Zebrafish) (Zebra danio), 167 aa. 


I..I50 
! -.167 


117/167 (70%) 
131/167 (78%) 


4e-65 


! Q924W9 

1 ! 


Putative LMOl hornologue - Mus 
musculus (Mouse), 156 aa. 


21.136 
22..148 


68/127 (53%) 
85/127 (66%) 


le-37 


; Q99MB5 


Neuronal specific transcription factor 
DAT1 - Rattus norvegicus (Rat), 1 55 aa. 


21 ..132 
21 ..132 


64/112(57%) 
81/112(72%) 


4e-37 


P25800 

• 

1 


Rhombotin-1 (Cysteine rich protein 
TTG-1) (T-cell translocation protein 1) 
(LIM-only protein 1) - Homo sapiens 
(Human), 1 56 aa. 


21 ..132 
22..133 


64/112(57%) 
81/112(72%) 


4e-37 



5 

PFam analysis predicts that the NOV6a protein contains the domains shown in the 
Table 6F. 



Table 6F. Domain Analysis of NOV6a 






Pfam Domain 


NOV6a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


LIM 


23..82 


24/62 (39%) 
50/62 (81%) 


2.8e-18 


:LIM 
1 

L . .. 


87..139 


18/61 (30%) 
42/61 (69%) 


4.4e-l2 



10 Example 7. 

The NOV7 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 7A. 
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Table 7A. NOV7 Sequence Analysis 



SEQ!DNO:23 



NOV7a, 

CGI 03764-0 1 DNA 
Sequence 



2532 bp 



CGGCGCCGCTCGCTCCCCCGACCCGGACTCCCCCATGTATGACGACTCCTACGTGCCCGGGTT 
TGAGGACTCGGAGGCGGGTTCAGCCGACTCCTACACCAGCCGCCCATCTCTGGACTCAGACGT 
CTCCCTGGAGGAGGACCGGGAGAGTGCCCGGCGTGAAGTAGAGAGCCAGGCTCAGCAGCAGCT 
CGAAAGGGCCAAGCACAAACCTGTGGCATTTGCGGTGAGGACCAATGTCAGCTACTGTGGCGT 
ACTGGATGAGGAGTGCCCAGTCCAGGGCTCTGGAGTCAACTTTGAGGCCAAAGATTTTCTGCA 
CATTAAAGAGAAGTACAGCAATGACTGGTGGATCGGGCGGCTAGTGAAAGAGGGCGGGGACAT 
CGCCTTCATCCCCAGCCCCCAGCGCCTGGAGAGCATCCGGCTCAAACAGGAGCAGAAGGCCAG 
GAGATCTGGGAACCCTTCCAGCCTGAGTGACATTGGCAACCGACGCTCCCCTCCGCCATCTCT 
AGCCAAGCAGAAGCAAAAGCAGGCGGAACATGTTCCCCCATATGACGTGGTGCCCTCCATGCG 
GCCTGTGGTGCTGGTGGGACCCTCTCTGAAAGGTTATGAGGTCACAGACATGATGCAGAAGGC 
TCTCTTCGACTTCCTCAAACACAGATTTGATGGCAGGATCTCCATCACCCGAGTCACAGCCGA 
CCTCTCCCTGGCAAAGCGATCTGTGCTCAACAATCCGGGCAAGAGGACCATCATTGAGCGCTC 
CTCTGCCCGCTCCAGCATTGCGGAAGTGCAGAGTGAGATCGAGCGCATATTTGAGCTGGCCAA 
ATCCCTGCAGCTAGTAGTGTTGGACGCTGACACCATCAACCACCCAGCACAGCTGGCCAAGAC 
CTCGCTGGCCCCCATCATCGTCTTTGTCAAAGTGTCCTCACCAAAGGTACTCCAGCGTCTCAT 
TCGCTCCCGGGGGAAGTCACAGATGAAGCACCTGACCGTACAGATGATGGCATATGATAAGCT 
GGTTCAGTGCCCACCGGAGTCATTTGATGTGATTCTGGATGAGAACCAGCTGGAGGATGCCTG 
TGAGCACCTGGCTGAGTACCTGGAGGTTTACTGGCGGGCCACGCACCACCCAGCCCCTGGCCC 
CGGACTTCTGGGTCCTCCCAGTGCCATCCCCGGACTTCAGAACCAGCAGCTGCTGGGGGAGCG 
TGGCGAGGAGCACTCCCCCCTTGAGCGGGACAGCTTGATGCCCTCTGATGAGGCCAGCGAGAG 
CTCCCGCCAAGCCTGGACAGGATCTTCACAGCGTAGCTCCCGCCACCTGGAGGAGGACTATGC 
AGATGCCTACCAGGACCTGTACCAGCCTCACCGCCAACACACCTCGGGGCTGCCTAGTGCTAA 
CGGGCATGACCCCCAAGACCGGCTTCTAGCCCAGGACTCAGAGCACAACCACAGTGACCGGAA 
CTGGCAGCGCAACCGGCCTTGGCCCAAGGATAGCTACTQACAGCCTCCTGCTGCCCTACCCTG 



GCAGGCACAGGCGCAGCTGGCTGGGGGGCCCACTCCAGGCAGGGTGGCGTTAGACTGGCATCA 



GGCTGGCACTAGGCTCAGCCCCCAAAACCCCCTGCCCAGCCCCAGCTTCAGGGCTGCCTGTGG 



TCCCAAGGTTCTGGGAGAAACAGGGGACCCCCTCACCTCCTGGGCAGTGACCCCTACTAGGCT 



CCCATTCCAGGTACTAGCTGTGTGTTCTGCACCCCTGGCACCTTCCTCTCCTCCCACACAGGA 



AGCTGCCCCACTGGGCAGTGCCCTCAGGCCAGGATCCCCTTAGCAGGGTCCTTCCCACCAGAC 



TCAGGGAAGGGATGCCCCATTAAAGTGACAAAAGGGTGGGGTGTGGGCACCATGGCATGAGGA 



AGAAACAAGGTCCCTGAGCAGGCACAAGTCCTGACAGTCAAGGGACTGCTTTGGCATCCAGGG 



CCTCCAGTCACCTCACTGCCATACATTAGAAATGAGACAATCAAAGCCCCCCCCAGGGTGGCA 



CACCCATCCGTTTGCTGGGGTGTGGCAGCCACATCCAAGACTGGAGCAGCAGGCTGGCCACGC 



TCGGGCCAGAGAGAGCTCACAGCTGAAGCTCTTGGAGGGAAGGGCTCTCCTCACCCTGCCAGG 



AAGCTTCTTAACATGTGACAGGACCAGGGACCAGGAGCATGGTGAAGCCAAGTGGCAGATGGG 



AG C C AACCTGG ATGGGGGTTTGGGG AAGG AGGGC ATGTG TAG C AG AG AACTT AGGGGGG C CTC 



ICTTGCCTTTCTCATTCTTTTGCCCTGCATCCTGTCATTTCTGTTCTTGTCCCTCATACATCTT 



GATGGGGAGAGACAGGGCACAGAGGACCTGTCTCCCCGGCTACTCTTGCCTTATGGCTCTAGT 



TGGAGAACCGGGCTCCAGACTTTGTTCCCTGACTCATAGCTGCCGCTTGTTAGGTTAGGGTTA 



GTGTGACCTACAGAGCATGCTCCACAAGCCCCTGCCTCACCTCACTGTCATCACTAATAAACA 





TCATGCACAGTC 




ORF Start: at 2 




ORF Stop: TGA at 1487 




SEQ ID NO: 24 


495 aa 


MWat 55582.3kD 


NOV7a, 

CGI 03764-01 Protein 
Sequence 


GAARSPDPDSPMYDDSYVPGFEDSEAGSADSYTSRPSLDSDVSLEEDRESARREVESOAOQQL 
ERAKHKPVAFAVRTNVSYCGVLDEECPVQGSGVNFEAKDFLHIKEKYSNDWWIGRLVKEGGDI 
AFIPSPQRLESIRLKQEQKARRSGNPSSLSDIGNRRSPPPSLAKQKQKQAEHVPPYDVVPSMR 
PWLVGPSLKGYEVTDMMQKALFDFLKHRFDGRISITRVTADLSLAKRSVLNNPGKRTIIERS 
SARSSIAEVQSEIERIFELAKSLQLWLDADTINHPAQLAKTSLAPIIVFVKVSSPKVLQRLI 
RSRGKSQMKHLTVQMMAYDKLVQCPPESFDVILDENQLEDACEHLAEYLEVYWRATHHPAPGP 
GLLGPPSAIPGLQNQQLLGERGEEHSPLERDSLMPSDEASESSRQAWTGSSQRSSRHLEEDYA 
DAYQDLYQPHRQHTSGLPSANGHDPQDRLLAQDSEHNHSDRNWQRNRPWPKDSY 




SEQ I D NO: 25 j2532 bp 




NOV7b, 

CGI 03764-01 DNA 
Sequence 


CGGCGCCGCTCGCTCCCCCGACCCGGACTCCCCCATGTATGACGACTCCTACGTGCCCGGGTT 
TGAGGACTCGGAGGCGGGTTCAGCCGACTCCTACACCAGCCGCCCATCTCTGGACTCAGACGT 
CTCCCTGGAGGAGGACCGGGAGAGTGCCCGGCGTGAAGTAGAGAGCCAGGCTCAGCAGCAGCT 
CG AAAGGGCC AAG CAC AAACCTGTGG CATTTG CGGTG AGG AC C AATG TCAG CT ACTG TGG C GT 
ACTGGATGAGGAGTGCCCAGTCCAGGGCTCTGGAGTCAACTTTGAGGCCAAAGATTTTCTGCA 
CATTAAAGAGAAGTACAGCAATGACTGGTGGATCGGGCGGCTAGTGAAAGAGGGCGGGGACAT 
CGCCTTCATCCCCAGCCCCCAGCGCCTGGAGAGCATCCGGCTCAAACAGGAGCAGAAGGCCAG 
GAGATCTGGGAACCCTTCCAGCCTGAGTGACATTGGCAACCGACGCTCCCCTCCGCCATCTCT 
AGCCAAGCAGAAGCAAAAGCAGGCGGAACATGTTCCCCCATATGACGTGGTGCCCTCCATGCG 
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GCCTGTGGTGCTGGTGGGACCCTCTCTGAAAGGTTATGAGGTCACAGACATGATGCAGAAGGC 
TCTCTTCGACTTCCTCAAACACAGATTTGATGGCAGGATCTCCATCACCCGAGTCACAGCCGA 
CCTCTCCCTGGCAAAGCGATCTGTGCTCAACAATCCGGGCAAGAGGACCATCATTGAGCGCTC 
CTCTGCCCGCTCCAGCATTGCGGAAGTGCAGAGTGAGATCGAGCGCATATTTGAGCTGGCCAA 
ATCCCTGCAGCTAGTAGTGTTGGACGCTGACACCATCAACCACCCAGCACAGCTGGCCAAGAC 
CTCGCTGGCCCCCATCATCGTCTTTGTCAAAGTGTCCTCACCAAAGGTACTCCAGCGTCTCAT 
TCGCTCCCGGGGGAAGTCACAGATGAAGCACCTGACCGTACAGATGATGGCATATGATAAGCT 
GGTTCAGTGCCCACCGGAGTCATTTGATGTGATTCTGGATGAGAACCAGCTGGAGGATGCCTG 
TGAGCACCTGGCTGAGTACCTGGAGGTTTACTGGCGGGCCACGCACCACCCAGCCCCTGGCCC 
CGGACTTCTGGGTCCTCCCAGTGCCATCCCCGGACTTCAGAACCAGCAGCTGCTGGGGGAGCG 
TGGCGAGGAGCACTCCCCCCTTGAGCGGGACAGCTTGATGCCCTCTGATGAGGCCAGCGAGAG 
CTCCCGCCAAGCCTGGACAGGATCTTCACAGCGTAGCTCCCGCCACCTGGAGGAGGACTATGC 
AGATGCCTACCAGGACCTGTACCAGCCTCACCGCCAACACACCTCGGGGCTGCCTAGTGCTAA 
CGGGCATGACCCCCAAGACCGGCTTCTAGCCCAGGACTCAGAGCACAACCACAGTGACCGGAA 
CTGGCAGCGCAACCGGCCTTGGCCCAAGGATAGCTACTGACAGCCTCCTGCTGCCCTACCCTG 
GCAGGCACAGGCGCAGCTGGCTGGGGGGCCCACTCCAGGCAGGGTGGCGTTAGACTGGCATCA 


GGCTGGCACTAGGCTCAGCCCCCAAAACCCCCTGCCCAGCCCCAGCTTCAGGGCTGCCTGTGG 


TCCCAAGGTTCTGGGAGAAACAGGGGACCCCCTCACCTCCTGGGCAGTGACCCCTACTAGGCT 


CCCATTCCAGGTACTAGCTGTGTGTTCTGCACCCCTGGCACCTTCCTCTCCTCCCACACAGGA 


AGCTGCCCCACTGGGCAGTGCCCTCAGGCCAGGATCCCCTTAGCAGGGTCCTTCCCACCAGAC 


TCAGGGAAGGGATGCCCCATTAAAGTGACAAAAGGGTGGGGTGTGGGCACCATGGCATGAGGA 


AGAAACAAGGTCCCTGAGCAGGCACAAGTCCTGACAGTCAAGGGACTGCTTTGGCATCCAGGG 


CCTCCAGTCACCTCACTGCCATACATTAGAAATGAGACAATCAAAGCCCCCCCCAGGGTGGCA 


CACCCATCCGTTTGCTGGGGTGTGGCAGCCACATCCAAGACTGGAGCAGCAGGCTGGCCACGC 


TCGGGCC AG AG AG AG CTC AC AG CTG AAGC TCTTGG AGGG AAGGGCTC TCCTC A C C C TGC C AGG 


AAGCTTCTTAACATGTGACAGGACCAGGGACCAGGAGCATGGTGAAGCCAAGTGGCAGATGGG 


AGCCAACCTGGATGGGGGTTTGGGGAAGGAGGGCATGTGTAGCAGAGAACTTAGGGGGGCCTC 


CTTGCCTTTCTCATTCTTTTGCCCTGCATCCTGTCATTTCTGTTCTTGTCCCTCATACATCTT 


TGGAGAACCGGGCTCCAGACTTTGTTCCCTGACTCATAGCTGCCGCTTGTTAGGTTAGGGTTA 


GATGGGGAGAGACAGGGCACAGAGGACCTGTCTCCCCGGCTACTCTTGCCTTATGGCTCTAGT 


GTGTGACCTACAGAGCATGCTCCACAAGCCCCTGCCTCACCTCACTGTCATCACTAATAAACA 


TCATGCACAGTC 




ORF Start: at 2 | jORF Stop: TGA at 1487 


1 |SEQIDNO:26 J495 aa MW at 55582.3kD 


jNOV7b, 

CGI 03764-01 Protein 
Sequence 

s 


GAARSPDPDSPMYDDSYVPGFEDSEAGSADSYTSRPSLDSDVSLEEDRESARREVESQAQQQL 
ERAKHKPVAFAVRTNVSYCGVLDEECPVQGSGVNFEAKDFLHIKEKYSNDWWIGRLVKEGGDI 
AFIPSPQRLESIRLKQEQKARRSGNPSSLSDIGNRRSPPPSLAKQKQKQAEHVPPYDWPSMR 
PVVLVGPSLKGYEVTDMMQICALFDFLKHRFDGRISITRVTADLSLAKRSVLNNPGKRTIIERS 
SARSSIAEVQSEIERIFELAKSLQLWLDADTINHPAQLAKTSLAPIIVFVKVSSPKVLQRLI 
RSRGKSQMKHLTVQMMAYDKLVQCPPESFDVILDENQLEDACEHLAEYLEVYWRATHHPAPGP 
GLLGPPSAIPGLQNQQLLGERGEEHSPLERDSLMPSDEASESSRQAWTGSSQRSSRHLEEDYA 
DAYQDLYQPHRQHTSGLPSANGHDPQDRLLAQDSEHNHSDRNWQRNRPWPKDSY 




SEQ ID NO: 27 1 1822 bp j 


NOV7c, 

212779035 DNA 
Sequence 


TTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATA ' 


T AAGC AG AGCT CTCTGG C T AACT AG AG AAC CC ACTG C TT ACTGG C TT ATC G AAATT AA T ACG A 


CTCACTATAGGGAGACCCAAGCTGGCTAGCGTTTAAACTTAAGCTTGGTACCGAGCTCGGATr 
CACCATGTATGACGACTCCTACGTGCCCGGGTTTGAGGACTCGGAGGCGGGTTCAGCCGACTC 
CTACACCAGCCGCCCATCTCTGGACTCAGACGTCTCCCTGGAGGAGGACCGGGAGAGTGCCCG 
GCGTGAAGTAGAGAGCCAGGCTCAGCAGCAGCTCGAAAGGGCCAAGCACAAACCTGTGGCATT 
TGCGGTGAGGACCAATGTCAGCTACTGTGGCGTACTGGATGAGGAGTGCCCAGTCCAGGGCTC 
TGGAGTCAACTTTGAGGCCAAAGATTTTCTGCACATTAAAGAGAAGTACAGCAATGACTGGTG 
GATCGGGCGGCTAGTGAAAGAGGGCGGGGACATCGCCTTCATCCCCAGCCCCCAGCGCCTGGA 
GAGCATCCGGCTCAAACAGGAGCAGAAGGCCAGGAGATCTGGGAACCCTTCCAGCCTGAGTGA 
CATTGGCAACCGACGCTCCCCTCCGCCATCTCTAGCCAAGCAGAAGCAAAAGCAGGCGGAACA 
TGTTCCCCCATATGACGTGGTGCCCTCCATGCGGCCTGTGGTGCTGGTGGGACCCTCTCTGAA 
AGGTTATGAGGTCACAGACATGATGCAGAAGGCTCTCTTCGACTTCCTCAAACACAGATTTGA 
TGGCAGGATCTCCATCACCCGAGTCACAGCCGACCTCTCCCTGGCAAAGCGATCTGTGCTCAA 
CAATCCGGGCAAGAGGACCATCATTGAGCGCTCCTCTGCCCGCTCCAGCATTGCGGAAGTGCA 
GAGTGAGATCGAGCGCATATTTGAGCTGGCCAAATCCCTGCAGCTAGTAGTGTTGGACGCTGA 
CACCATCAACCACCCAGCACAGCTGGCCAAGACCTCGCTGGCCCCCATCATCGTCTTTGTCAA 
AGTGTCCTCACCAAAGGTACTCCAGCGTCTCATTCGCTCCCGGGGGAAGTCACAGATGAAGCA 
CCTGACCGTACAGATGATGGCATATGATAAGCTGGTTCAGTGCCCACCGGAGTCATTTGATGT 
GATTCTGGATGAGAACCAGCTGGAGGATGCCTGTGAGCACCTGGCTGAGTACCTGGAGGTTTA 
CTGGCGGGCCACGCACCACCCAGCCCCTGGCCCCGGACTTCTGGGTCCTCCCAGTGCCATCCC 
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CGGACTTCAGAACCAGCAGCTGCTGGGGGAGCGTGGCGAGGAGCACTCCCCCCTTGAGCGGGA 
CAGCTTGATGCCCTCTGATGAGGCCAGCGAGAGCTCCCGCCAAGCCTGGACAGGATCTTCACA 
GCGTAGCTCCCGCCACCTGGAGGAGGACTATGCAGATGCCTACCAGGACCTGTACCAGCCTCA 
CCGCCAACACACCTCGGGGCTGCCTAGTGCTAACGGGCATGACCCCCAAGACCGGCTTCTAGC 
CCAGGACTCAGAACACAACCACAGTGACCGGAACTGGCAGCGCAACCGGCCTTGGCCCAAGGA 
TAGCTACTGAGCGGCCGCTCGAGTCTAGAGGGCCCGTTTAAACCCGCTGATCAGCCTCGACTG 


TGCCTTCTAGTTGCCAGCCATCTGTTGTTTGCCCCTCCCCCGTGCCTTCCTTGACCCTGGAAG 


IGTGCCACTCCCACTGTCCTTTCCTAATAAAATGAGGAAATTGCATCGCATTGTCTGAG 




ORF Start: at 137 j 


ORF Stop:TGA at 1646 




SEQ ID NO: 28 |503 aa MW at 56533.6RD 


NOV7c, 

212779035 Protein 
Sequence 


GDPSWLAFKLKLGTELGSTMYDDSYVPGFEDSEAGSADSYTSRPSLDSDVSLEEDRESARREV 
ESQAQQQLERAKHKPVAFAVRTNVSYCGVLDEECPVQGSGVNFEAKDFLHIKEKYSNDWWIGR 
LVKEGGDIAFIPS PQRLESIRLKQEQKARRSGNPSSLSDIGNRRSPPPSLAKQKQKQAEHVPP 
YDWPSMRPWLVGPSLKGYEVTDMMQKALFDFLKHRFDGRISITRVTADLSLAKRSVLNNPG 
KRTIIERSSARSSIAEVQSEIERIFELAKSLQLWLDADTINHPAQLAKTSLAPI IVFVKVSS 
PKVLQRLIRSRGKSQMKHLTVQMMAYDKLVQCPPES FDVILDENQLEDACEHLAEYLEVYWRA 
THHPAPGPGLLGPPSAIPGLQNQQLLGERGEEHSPLERDSLMPSDEASESSRQAWTGSSQRSS 
RHLEEDYADAYQDLYQPHRQHTSGLPSANGHDPQDRLLAQDSEHNHSDRNWQRNRPWPKDSY 




SEQ ID NO: 29 1 1449 bp j . J 


NOV7d, 

CGI 03 764-02 DNA 
Sequence 


ATGT ATG ACG ACT CCT AC CTG C C CGGGTTTG AGG ACT CGG AGGCG GGTT C AG CCGACTCCTAC 1 

ACCAGCCGCCCATCTCTGGACTCAGACGTCCTGGAGGAGGACCGGGAGAGTGCCCGGCGTGAA | 

GTAGAGAGCCAGGCTCAGCAGCAGCTCGAAAGGGCCAAGCACAAACCTGAGGCATTTGCGGTG 

AGGACCAATGTCAGCTACTGTGGCGTACTGGATGAGGAGTGCCCAGTCCAGGGCTCTGGAGTC 

AA CTTTG AGG C C AAAG ATTT C CT CC A C ATT AAAG AG AAG TAC AG C AATG ACTG G T G G AT C G G G 

CGGCTAGTGAAAGAGGGCGGGGACATCGCCTTCATCCCCAGCCCCCAGCGCCTGGAGAGCATC 

CGGCTCAAACAGGAACAGAAGGCCAGGAGATCCGGGAACCCTTCCAGCCTGAGTGACATTGGC 

AACCGACGTTCCCCTCCTCCATCTCTAGCCAAGCAGAAGCAAAAGCAGGCGGAACATGTTCCC 

CCGTATGACGTGGTGCCCTCAATGCGGCCGGTGGTGCTGGTGGGACCCTCTCTGAAAGGTTAT 

GAGGTCACAGACATGATGCAGAAGGCTCTCTTCGACTTCCTCAAACACAGATTTGATGGCAGG 

AT CTCC ATCACCCG AGT C AC AG C CG ACC T CTC CC TGG CAAAG CG ATC TGTG C TC AAC AAT C CG 

GGCAAGAGGACCATCATTGAGCGCTCCTCTGCCCGCTCCAGCATTGCGGAAGTGCAGAGTGAG 

ATCGAGCGCATATTTGAGCTGGCCAAATCCCTGCAGCTAGTAGTGTTGGACGCTGACACCATC 

AACCACCCAGCACAGCTGGCCAAGACCTCGCTGGCCCCCATCATCGTCTTTGTCAAAGTGTCC 

TCACCAAAGGTACTCCAGCGTCTCATTCGCTCCCGGGGGAAGTCACAGATGAAGCACCTGACC 

GTACAGATGGCATATGATAAGCTGGTTCAGTGCCCACCGGAGTCATTTGATGTGATTCTGGAT 

GAGAACCAGCTGGAGGATGCCTGTGAGCTCCTGGCTGAGTACCTGGAGGTTTACTGGCGGGCC 

ACGCACCACCCAGCCCCTGGCCCCGGACTTCTGGGTCCTCCCAGTGCCATCCCCGGACTTCAG 

AACCAGCAGCTGCTGGGGGAGCGTGGCGAGGAGCACTCCCCCCTTGAGCGGGACAGCTTGATG 

CCCTCTGATGAGGCCAGCGAGAGCTCCCGCCAAGCCTGGACAGGATCTTCACAGCGTACGTCci 

CGCCACCTGGAGGAGGACTACGCAGATGCCTACCAGGACCTGTACCAGCCTCACCGCCAACAC 

ACCTCGGGGCTGCCTAGTGCTAACGGGCATGACCCCCAAGACCGGCTTCTAGCCCAGGACTCA 

GAACACAACCACAGTGACCGGAACTGGCAGCGCAACCGCCCTTGGCCCAAGGATAGCTACTGA 




ORF Start: ATG at 1 j 


ORF Stop: TGA at 1447 




SEQ ID NO: 30 |482 aa MW at 54347.1 kD 


NOV7d, 

CG 103764-02 Protein 
Sequence 


MYDDSYLPGFEDSEAGSADSYTSRPSLDSDVLEEDRESARREVESQAQQQLERAKHKPEAFAV 
RTNVSYCGVLDEECPVQGSGVNFEAKDFLHIKEKYSNDWWIGRLVKEGGDIAFIPSPQRLESI 
RLKQEQKARRSGNPSSLSDIGNRRSPPPSLAKQKQKQAEHVPPYDWPSMRPVVLVGPSLKGY 
EVTDMMQKALFDFLKHRFDGRISITRVTADLSLAKRSVLNNPGKRTIIERSSARSSIAEVQSE 
IERIFELAKSLQLWLDADTINHPAQLAKTSLAPIIVFVKVSSPKVLQRLIRSRGKSQMKHLT 
VQMAYDKLVQCPPESFDVILDENQLEDACELLAEYLEVYWRATHHPAPGPGLLGPPSAIPGLQ 
NQQLLGERGEEHS PLERDSLMPSDEASESSRQAWTGSSQRTSRHLEEDYADAYQDLYQPHRQH 
TSGLPSANGHDPQDRLLAQDSEHNHSDRNWQRNRPWPKDSY 





SEQ ID NO: 31 


1822 bp 


NOV7e, 

CGI 03764-03 DNA 
Sequence 


TTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATA 


TAAGCAGAGCTCTCTGGCTAACTAGAGAACCCACTGCTTACTGGCTTATCGAAATTAATACGA 


CTCACTATAGGGAGACCCAAGCTGGCTAGCGTTTAAACTTAAGCTTGGTACCGAGCTCGGATC 


CACCATGTATGACGACTCCTACGTGCCCGGGTTTGAGGACTCGGAGGCGGGTTCAGCCGACTC 
CT AC ACC AG CCGC CC ATCTCTGG ACT C AG ACGTC TC C CTGG AGG AGG ACCGGG AG AGTG CC CG 
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GCGTGAAGTAGAGAGCCAGGCTCAGCAGCAGCTCGAAAGGGCCAAGCACAAACCTGTGGCATT 
TGCGGTGAGGACCAATGTCAGCTACTGTGGCGTACTGGATGAGGAGTGCCCAGTCCAGGGCTC 
TGGAGTCAACTTTGAGGCCAAAGATTTTCTGCACATTAAAGAGAAGTACAGCAATGACTGGTG 
GATCGGGCGGCTAGTGAAAGAGGGCGGGGACATCGCCTTCATCCCCAGCCCCCAGCGCCTGGA 
GAG CATC CGG CTC AAAC AGG AGC AG AAGG C C AGG AG ATCTGGG AACCCTT C C AG C CTG AG TG A 
CATTGGCAACCGACGCTCCCCTCCGCCATCTCTAGCCAAGCAGAAGCAAAAGCAGGCGGAACA 
TGTTCCCCCATATGACGTGGTGCCCTCCATGCGGCCTGTGGTGCTGGTGGGACCCTCTCTGAA 
AGGTTATGAGGTCACAGACATGATGCAGAAGGCTCTCTTCGACTTCCTCAAACACAGATTTGA 
TGGCAGGATCTCCATCACCCGAGTCACAGCCGACCTCTCCCTGGCAAAGCGATCTGTGCTCAA 
CAATCCGGGCAAGAGGACCATCATTGAGCGCTCCTCTGCCCGCTCCAGCATTGCGGAAGTGCA 
GAGTGAGATCGAGCGCATATTTGAGCTGGCCAAATCCCTGCAGCTAGTAGTGTTGGACGCTGA 
CACCATCAACCACCCAGCACAGCTGGCCAAGACCTCGCTGGCCCCCATCATCGTCTTTGTCAA 
AGTGTCCTCACCAAAGGTACTCCAGCGTCTCATTCGCTCCCGGGGGAAGTCACAGATGAAGCA 
CCTGACCGTACAGATGATGGCATATGATAAGCTGGTTCAGTGCCCACCGGAGTCATTTGATGT 
GATTCTGGATGAGAACCAGCTGGAGGATGCCTGTGAGCACCTGGCTGAGTACCTGGAGGTTTA 
CTGGCGGGCCACGCACCACCCAGCCCCTGGCCCCGGACTTCTGGGTCCTCCCAGTGCCATCCC 
CGGACTTCAGAACCAGCAGCTGCTGGGGGAGCGTGGCGAGGAGCACTCCCCCCTTGAGCGGGA 
C AGCT TG ATG C CCTCTGATGAGGCC AG CG AG AGCTC CCG CCAAGC CTGGA CAGG AT C TT CA C A 
GCGTAGCTCCCGCCACCTGGAGGAGGACTATGCAGATGCCTACCAGGACCTGTACCAGCCTCA 
C CG CC AAC AC ACCTCGGGG CTGCCT AGTG CT AACGGG CATG ACCCCC AAG AC CGG CTT CT AG C 
CCAGGACTCAGAACACAACCACAGTGACCGGAACTGGCAGCGCAACCGGCCTTGGCCCAAGGA 
TAGCT AC TG AG CG GCCGCTCG AGTCT AG AGGGCC CGTTT AAACCCG C TG AT C AGC CT CG AC TG 


TGCCTTCTAGTTGCCAGCCATCTGTTGTTTGCCCCTCCCCCGTGCCTTCCTTGACCCTGGAAG 


GTGCCACTCCCACTGTCCTTTCCTAATAAAATGAGGAAATTGCATCGCATTGTCTGAG 




ORF Start: ATG at 1 94 ] jORF Stop: TGA at 1 646 




SEQIDN0. 32 |484 aa |MW at 54531. 3 kD 


NOV7e, 

CGI 03764-03 Protein 
Sequence 


MYDDSYVPGFEDSEAGSADSYTSRPSLDSDVSLEEDRESARREVESQAQQQLERAKHKPVAFA 
VRTNVSYCGVLDEECPVQGSGVNFEAKDFLHIKEKYSNDWWIGRLVKEGGDIAFIPSPQRLES 
IRLKQEQKARRSGNPSSLSDIGNRRSPPPSLAKQKQKQAEHVPPYDWPSMRPWLVGPSLKG 
YEVTDMMQKALFDFLKHRFDGRISITRVTADLSLAKRSVLNNPGKRTIIERSSARSSIAEVQS 
EIERIFELAKSLQLWLDADTINHPAQLAKTSLAPI IVFVKVSSPKVLQRLIRSRGKSQMKHL 
TVQMMAYDKLVQCPPESFDVILDENQLEDACEHLAEYLEVYWRATHHPAPGPGLLGPPSAIPG 
LQNQQLLGERGEEHSPLERDSLMPSDEASESSRQAWTGSSQRSSRHLEEDYADAYQDLYQPHR 
QHTSGLPSANGHDPQDRLLAQDSEHNHSDRNWQRNRPWPKDSY 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 7B. 



Table 7B. Comparison of NOV7a against NOV7b through NOV7c. 


Protein Sequence 


NOV7a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV7b 


1..495 
1..495 


450/495 (90%) 
450/495 (90%) 


NOV7c 


I0..495 
18.. 503 


455/486 (93%) 
455/486 (93%) 


NOV7d 


I2..495 
1..482 


448/484 (92%) 
450/484 (92%) 


NOV7e 


12.. 495 
1..484 


454/484 (93%) 
454/484 (93%) 



5 



Further analysis of the NOV7a protein yielded the following properties shown in 
Table 7C. 

120 



WO 03/023002 




PCT/US02/28539 



j Table 7C Protein Sequence Properties NOV7a 


:PSort 
j analysis: 


; 0.6500 probability located in cytoplasm; 0.1000 probability located in mitochondrial 
j matrix space; 0.1000 probability located in lysosome (lumen); 0.1000 probability 
j located in plasma membrane 


jSignalP 
^analysis: 


; No Known Signal Sequence Predicted 

1 , |M .... M|||| i , im i , in -i, -i i i . 



A search of the NOV7a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
5 homologous proteins shown in Table 7D. 



| Table 7D. Geneseq Results for NOV7a 



10 



r 

Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV7a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAW37880 


Human calcium channel b3 subunit - 
Homo sapiens, 484 aa. [W0981 1131- 
A2, 19-MAR-1998] 


I2..495 
1..484 


484/484(100%) 
484/484(100%) 


0.0 


AAR39564 

i 


Human neuronal VDCC beta-subunit 
encoded by clone HBB2 - Homo 
sapiens, 484 aa. [DE4222126-A, 19- 
AUG-1993] 


12..495 
1..484 


482/484 (99%) 
484/484 (99%) 


0.0 


JAABI0578 


Human calcium channel beta3-l 
subunit protein - Homo sapiens, 483 
aa. [US6096514-A,01-AUG-2000] 


21. .495 
3. .483 


473/481 (98%) 
473/481 (98%) 


0.0 


AAR76214 


Human calcium channel subunit beta 
3-1 - Homo sapiens, 483 aa. 
[WO9504822-A, 16-FEB-1995] 


21. .495 
3..483 


473/481 (98%) 
473/481 (98%) 


0.0 


AAB 10585 


Human calcium channel beta-4 subunit 
protein U2 - Homo sapiens, 520 aa. j 
[US60965 14-A, 01-AUG-2000] 


27..470 
50..520 


330/474 (69%) 
367/474 (76%) 


e-172 


In a BLAST search of public sequence datbases, the NOV7a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 7E. 


Table 7E. Public BLASTP Results for NOV7a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV7a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 



121 
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TIC A OO A 

P54284 


Dihydropyridine-sensitive L-type, 
calcium channel beta-3 subunit (CA83) 
(Voltage-dependent calcium channel 
beta-3 subunit) - Homo sapiens 
(Human), 484 aa. 


1 2..49S 
I..484 


484/484 ( IW /o) 
484/484(100%) 


A A 

0.0 

i 


P54287 


Dihydropyridine-sensitive L-type, 
calcium channel beta-3 subunit (CAB3) 
(Voltage-dependent calcium channel 
beta-3 subunit) - Rattus norvegicus 
(Rat) ? 484 aa. 


12..495 
1..484 


480/484 (99%) 
484/484 (99%) 


0.0 


S62185 


calcium channel beta3 chain a - mouse, 
484 aa. 


12..495 
I..484 


478/484 (98%) 
482/484 (98%) 


0.0 


AAM22963 


Calcium channel beta 3 subunit - Mus 
musculus (Mouse), 484 aa. 


12..495 
1 ..484 


479/484 (98%). 
481/484 (98%) 


0.0 


Q9MZL3 


Dihydropyridine-sensitive L-type, 
calcium channel beta-3 subunit (CAB3) 
(Voltage-dependent calcium channel 
beta-3 subunit) - Bos taurus (Bovine), 
484 aa. 


I2..495 
1 ..484 


477/484 (98%) 
481/484 (98%) 

1 


0.0 

i 

i 



PFam analysis predicts that the NOV7a protein contains the domains shown in the 
Table 7F. 



Table 7R Domain Analysis of NOV7a 


Pfam Domain 


NOV7a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


Ca_channel_B 


I84..401 


183/226 (81%) 
217/226 (96%) 


9.7e-182 

i 



5 

Example 8. 

The NOV8 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 8A. 



Table 8A. NOV8 Sequence Analysis 




SEQ!DNO: 33 2037 bp 


NOV8a, 

CGI 04944-01 DNA 
Sequence 


CTACTTGGTCTCCTGCTTTCGCGACATGGCCTTCAATTTTGGGGCTCCCTCGGGCACCTCCGG 


TACCGCTGCAGCCACCGCGGCCCCCGGCTGGGTTTGGAGGATTTGGGACAACATCTACAACTG 


CAGGTTCTGCATTCAGCTTTTCTGCCCCAACTAACACAGGCACTACTGGACTCTTTGGTGGTA 


CTCAGAACAAAGGTTTTGGATTTGGTACTGGTTTTGGCACAACAACGGGAACTAGTACTGGTT 


TAGGTACTGGTTTGGGAACTGGACTGGGATTTGGAGGATTTAATACACAGCAGCAGCAGCAAA 


CTAGCAGTAGGTTATAGTTGCATGCCCAGTAATAAAGATGAAGATGGGCTAGTGGTTTTAGTT 
TTCAACAAAAAAGAAACAGAGATTCGAAGCCAACAACAACAGTTGGTAGAATCATTGCATAAA 
GTTTTGGGAGGAAACCAGACCCTTACTGTAAATGTAGAGGGCACTAAAACATTGCCAGATGAT 
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jCAGACAGAAGTTGTTATTTATGTTGTTGAGCGTTCGCCAAATGGTACTTCAAGAAGAGTTCCA~ 
jGCTACAACGCTATATGCCCATTTTGAACAAGCCAATATAAAAACACAATTGCAGCAACTTGGT 
iGTAACCCTTTCTATGACTAGAACAGAACTTTCTCCTGCACAGATCAGACAGCTTTTACAGAAT 
jCCTCCTGCTGGTGTTGATCCTATTATCTGGGAACAGGCCAAGGTAGATAACCCTGATTCTGAA 
|AAGTTAATTCCTGTACCAATGGTGGGTTTTAAGGAACTTCTCCGAAGACTGAAGGTTCAAGAT 
CAGATGACTAAGCAGCATCAAACCAGATTAGATATCATATCTGAAGATATTAGTGAGCTACAA 
AAGAATCAAACTACATCTGTAGCCAAAATTGCACAATACAAGAGGAAACTCATGGATCTTTCC 
CATAGAACTTTACAGGTCCTAATCAAACAGGAAATTCAAAGGAAGAGTGGTTATGCCATTCAG 
GCTGATGAAGAGCAGTTGCGAGTTCAGCTGGATACGATTCAGGGTGAACTAAATGCACCTACT 
CAGTTCAAGGGCCGACTAAATGAATTGATGTCTCAAATCAGGATGCAGAATCATTTTGGAGCA 
GTCAGATCTGAAGAAAGGTATTACATAGATGCAGATCTGTTACGAGAAATCAAGCAGCATTTG 
AAACAACAACAGGAAGGCCTTAGCCATTTGATTAGCATCATTAAAGACGATCTAGAAGATATA 
AAGCTGGTCGAACATGGATTGAATGAAACCATCCACATCAGAGGTGGTGTCTTTAGTTGACAG 
TTCACAAACTTGTGTAAAGGTTTGTGAAATGCATCTTCTTACTGCATCAGACCTTCCTTAAGA 
ATGAAACCGACCACATGGAGGGAAAAAGAAAACAATTCTTTCTTGGATTGGTTTTTTGAGAAG 
TTTACTGACAAATTACTGTTCATCAAATCTGAAATAGTCACCTCACAGCTCTTCAAAGAAAAC 
CTTTGAAAGATTTATATCTAAAAGCTGTATTTACTTTAAAAGAAGTGCATAATTACCAAAATT 
GTATGTACTATTGTACATTTTTACAACAGCATTTTCTTAAACATAATCTGTGTTTAATGATTA 
TTGTCCATTGAGCCTGTACTCTGCTTTCCATACCAAGTAAATATGAAATAATCTACTTTGCAC 
ATAACAGAAGAAACTATAATTACTTGGCTGTTGGAGATTTGTACTTGAGTATAAATGTACACC 
AGTTTTTGTATTTGTGAACTCATCTGTGGGAGGAGTAAAGAAAATCCAAAAGCATTTAATGTT 
TTGTTTTTGTTCTATAAAGATATGAAAATGTATTTTTATATTATTTTACTTATTTGGAATTTA 
CAGAGCACACCTAAGCAATTAGGATATAACAAAACTACTTAACCATTTTTGCAACCATTTTGT 
TTTTTAAGCCTTTTTATTTCTAAAAAGATGAAAACTTATAAATAAATTCTTAATTTGTAATTA 



CTTTTAAAAAAAAAAAAAAAA 





ORF Start: ATG at 337 


jORFStop: TGA at 1318 


r 


SEQ ID NO: 34 327 aa |M W at 37377.4kD 


WOVSa, 

jCG 104944-01 Protein 

^Sequence 

i 

i 


MPSNKDEDGLWLVFNKKETEIRSQQQQLVESLHKVLGGNQTLTVNVEGTKTLPDDOTEWIY 
WERSPNGTSRRVPATTLYAHFEQANIKTQLQQLGVTLSMTRTELSPAQIRQLLQNPPAGVDP 
IIWEQAKVDNPDSEKLIPVPMVGFKELLRRLKVQDQMTKQHQTRLDIISEDISELQKNQTTSV 
AKIAQYKRKLMDLSHRTLQVLIKQEIQRKSGYAIQADEEQLRVQLDTIQGELNAPTQFKGRLN 
ELMSQIRMQMHFGAVRSEERYYIDADLLREIKQHLKQQQEGLSHLISIIKDDLEDIKLVEHGL 
NET I H I RGG V FS 



Further analysis of the NOV8a protein yielded the following properties shown in 
Table 8B. 



j Table 8B. Protein Sequence Properties NOV8a 


^Son 
i analysis: 


0.4500 probability located in cytoplasm; 0.3000 probability located in microbody 
(peroxisome); 0.1590 probability located in lysosome (lumen); 0.1000 probability 
located in mitochondrial matrix space 


j SignalP 
1 analysis: 


No Known Signal Sequence Predicted 



5 



A search of the NOV8a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 8C. 



1 1 — — 1 

j Table 8C Geneseq Results for NOV8a 


i 

j Geneseq 


Protein/Organism/Lcngth [Patent #, 


NOV8a 


Identities/ 


Expect 


j Identifier 


Date) 




Similarities for 


Value 



12? 



WO 03/023002 




PCT/US02/28539 







Match 
Residues 


the Matched 
Region 




IAAB425I8 

I 


Human ORFX ORF2282 polypeptide 
sequence SEQ ID NO:4564 - Homo 
sapiens, 354 aa. [WO200058473-A2, 
05-OCT-2000] 


I..327 
28.354 


327/327 (100%) 
327/327 (100%) 


0.0 

i 


AAB60098 


Human transport protein TPPT-18 - 
Homo sapiens, 507 aa. 
[WO200078953-A2, 28-DEC-2000] 


1..327 
181. .507 


326/327 (99%) 
327/327 (99%) 


0.0 


AAB93036 


Human protein sequence SEQ ID 
NO: 1 1 8 1 4 - Homo sapiens, 484 aa. 
[EP 1 0746 1 7-A2, 07-FEB-200 1 ] 


I..302 
181. .482 


297/302 (98%) 
301/302 (99%) 


e-166 


ABB64020 


Drosophila melanogaster polypeptide 
SEQ ID NO 18852 - Drosophila 
melanogaster, 610 aa. [WO200171042- 
A2,27-SEP-200I] 


1 -.317 
288..606 


125/320 (39%) 
193/320 (60%) 


4e-62 


AAB42664 


Human ORFX ORF2428 polypeptide 
sequence SEQ ID NO:4856 - Homo 
sapiens, 237 aa. [WO200058473-A2, 
05-OCT-2000] 


1..56 
182..237 


56/56(100%) 
56/56(100%) 


8e-25 
v> 


In a BLAST search of public sequence datbases, the NOV8a protein was found to 


have homology to the proteins shown in the BLASTP data in Table 8D. 




Table 8D. Public BLASTP Results for NOV8a 








Protein 

Accession 

Number 


Protein/Organism/Length 


NOV8a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


P70582 


Nucleoporin p54 - Rattus norvegicus 
(Rat), 510 aa. 


I..327 
184.510 


315/327(96%) 
325/327 (99%) 


e-179 


Q96EA7 


Unknown (protein for MGC: 13407) - 
Homo sapiens (Human), 493 aa. 


1-313 
181. .493 


312/313(99%) 
313/313(99%) 


e-175 


Q9P0I1 


Nucleoporin p54 protein - Homo 
sapiens (Human), 505 aa. 


1..327 
180.. 505 


311/327 (95%) 
315/327 (96%) 


e-173 


Q9NVL5 


CDNA FLJ 10655 fis, clone 
NT2RP2005933, weakly similar to ■ 
nucleoporin NUP57 - Homo sapiens 
(Human), 484 aa. ! 


1..302 
181. .482 


297/302 (98%) 
301/302 (99%) 


e-166 


Q9V6B9 


CG883 1 protein (LD2484 1 P) - 
Drosophila melanogaster (Fruit fly), 
610 aa. 


1.317 
288..606 


125/320 (39%) 
193/320 (60%) 


le-61 



5 
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PFam analysis predicts that the N0V8a protein contains the domains shown in the 
Table 8E. 



Table 8E. Domain Analysis of NOV8a 



Pfam Domain 


NOV8a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 





5 Example 9. 



The NOV9 clone was analyzed, and the nucleotide and encoded polypeptide 



sequences are shown in Table 9A. 


(Table 9A. NOV9 Sequence Analysis 


• .... , r 


SEQIDNO:35 jl471 bp 


NOV9a, 

CGI 06550-01 DNA 
Sequence 


TTG AA C ATGGACG AAGGAATT CC TC ATTTG C AAG AG AG AC AGTT ACTGG AAC AT AG AG AT T T T 
ATAGGACTGGACTATTCCTCTTTGTATATGTGTAAACCCAAAAGGAGCATGAAACGAGACGAC 
ACCAAGGATACCTACAAATTACCGCACAGATTAATAGAAAAGAAAAGAAGAGACCGAATTAAT 
GAATGCATTGCTCAGCTGAAAGATTTACTGCCTGAACATCTGAAATTGACAACTCTGGGACAT 
CTGG AG AAAGCTGTAGTCTTGGAATTAACTTTGAAAC ACT TAAAAGC TTT AAC CG C CT TAACC 
GAGCAACAGCATCAGAAGATAATTGCTTTACAGAATGGGGAGCGATCTCTGAAATCGCCCATT 
CAGTCCGACTTGGATGCGTTCCACTCGGGATTTCAAACATGCGCCAAAGAAGTCTTGCAATAC 
CTCT C C C GGTTTG AG AGCTGG AC ACCCAG GG AGCCG C GGTGTGTCC AG CTG AT C AACC ACTTG 
CACGCCGTGGCCACCCAGTTCTTGCCCACCCCGCAGCTGTTGACTCAACAGGTCCCTCTGAGC 
AAAGGCACCGGCGCTCCCTCGGCCGCCGGGTCCGCGGCCGCCCCCTGCCTGGAGCGCGCGGGG 
C AG AAG CTGGAG C CCCTCGCC T A CTGCGTG C C CGTC ATC C AG CGG AC TC AG CC C AG CG CCG AG 
CTCGCCGCCGAGAACGACACGGACACCGACAGCGGCTACGGCGGCGAAGCCGAGGCCCGGCCG 
GACCGCGAGAAAGGCAAAGGCGCGGGGGCGAGCCGCGTCACCATCAAGCAGGAGCCTCCCGGG 
GAGGACTCGCCGGCGCCCAAGAGGATGAAGCTGGATTCCCGCGGCGGCGGCAGCGGCGGCGGC 
CCGGGGGGCGGCGCGGCGGCGGCGGCAGCCGCGCTTCTGGGGCCCGACCCTGCCGCCGCGGCC 
GCGCTGCTGAGACCCGACGCCGCCCTGCTCAGCTCGCTGGTGGCGTTCGGCGGAGGCGGAGGC 
GCGCCCTTCCCGCAGCCCGCGGCCGCCGCGGCCCCCTTCTGCCTGCCCTTCTGCTTCCTCTCG 
CCTTCTGCAGCTGCCGCCTACGTGCAGCCCTTCCTGGACAAGAGCGGCCTGGAGAAGTATCTG 
TACCCGGCGGCGGCTGCCGCCCCGTTCCCGCTGCTATACCCCGGCATCCCCGCCCCGGCGGCA 
GCCGCGGCAGCCGCCGCCGCCGCTGCCGCCGCCGCCGCCGCGTTCCCCTGCCTGTCCTCGGTG 
TTGTCGCCCCCTCCCGAGAAGGCGGGCGCCGCCGCCGCGACCCTCCTGCCGCACGAGGTGGCG 
CCCCTTGGGGCGCCGCACCCCCAGCACCCGCACGGCCGCACCCACCTGCCCTTCGCCGGGCCC 
CGCGAGCCGGGGAACCCGGAGAGCTCTGCTCAGGAAGATCCCTCGCAGCCAGGAAAGGAAGCT 
CCCTGAATCCTTGCGTCCCGAA 




ORF Start: ATG at 7 j jORF Stop: TGA at 1 453 




SEQIDNO:36 482 aa 


MWat 50496.7kD 


NOV9a, 

CGI 06550-01 Protein 
Sequence 

1 


MDEGIPHLQERQLLEHRDFIGLDYSSLYMCKPKRSMKRDDTKDTYKLPHRLIEKKRRDRINEC 
IAQLKDLLPEHLKLTTLGHLEKAWLELTLKHLKALTALTEQQHQKIIALQNGERSLKSPIQS 
DLDAFHSGFQTCAKEVLQYLSRFESWTPREPRCVQLINHLHAVATQFLPTPQLLTQOVPLSKG 
TGAPSAAGSAAAPCLERAGQKLEPLAYCVPVIQRTQPSAELAAENDTDTDSGYGGEAEARPDR 
EKGKGAGASRVTIKQEPPGEDSPAPKRMKLDSRGGGSGGGPGGGAAAAAAALLGPDPAAAAAL 
LRPDAALLSSLVAFGGGGGAPFPQPAAAAAPFCLPFCFLSPSAAAAYVQPFLDKSGLEKYLYP 
AAAAAPFPLLYPGIPAPAAAAAAAAAAAAAAAAFPCLSSVLSPPPEKAGAAAATLLPHEVAPL 
GAPHPQHPHGRTHLPFAGPREPGNPESSAQEDPSQPGKEAP 




SEQIDNO:37 628 bp 




NOV9b, 


TTGAACATGGACGAAGGAATTCCTCATTTGCAAGAGAGACAGTTACCGGAACATAGAGATTTT 
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iCG 106550-02 DNA 
Sequence 


ATAGGACTGGACTATTCCTCTTTGTATATGTGTAAACCCAAAAGGAGCATGAAACGAGACGAC 
ACCAAGGATACCTACAAATTACCGCACAGATTAATAGAAAAGAAAAGAAGAGACCGAATTAAT 
GAATGCATTGCTCAGCTGAAAGATTTACTGCCTGAACATCTGAAATTGACAACTCTGGGACAT 
CTGGAGAAAGCTGTAGTCTTGGAATTAACTTTGAAACACTTAAAAGCTTTAACCGCCTTAACC 
GAGCAACAGCATCAGAAGATAATTGCTTTACAGAATGGGGAGCGATCTCTGAAATCGCCCATT 
CAGTCCGACTTGGATGCGTTCCACTCGGGATTTCAAACATGCGCCAAAGAAGTCTTGCAATAC 
CTCTCCCGGTTTGAGGGCTGGACACCCAGGGAGCCGCGGTGTGTCCAGCTGATCAACCACTTG 
CACGCCGTGGCCACCCAGTTCCTGCCCTTCGCCGGGCCCCGCGAGCCGGGGAACCCGGAGAGC 
TCTGCTCAAGAAGATCCCTCGCAGCCAGGAAAGGAAGCTCCCTGAATCCTTGCGTCCCGAA 




ORF Start: at I j 


ORF Stop: TGA at 610 




SEQ!DNO:38 


203 aa 


MWat 23298.4kD 


NOV9b, 

CGI 06550-02 Protein 
Sequence 


LNMDEGIPHLQERQLPEHRDFIGLDYSSLYMCKPKRSMKRDDTKDTYKLPHRLIEKKRRDRIN 
ECIAQLKDLLPEHLKLTTLGHLEKAWLELTLKHLKALTALTEQQHQKI I ALQNGERSLKS PI 
QSDLDAFHSGFQTCAKEVLQYLSRFEGWTPREPRCVQLINHLHAVATQFLPFAGPREPGNPES 
SAQEDPSQPGKEAP 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 9B. 



Table 9B. Comparison of NOV9a against NOV9b. 


Protein Sequence 


NOV9a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV9b 


I..I93 
3..203 


163/201 (81%) 
165/201 (81%) 



5 

Further analysis of the NOV9a protein yielded the following properties shown in 
Table 9C. 



| Table 9C. Protein Sequence Properties NOV9a 


PSort 
analysis: 


0.7000 probability located in plasma membrane; 0.3000 probability located in 
nucleus; 0.2000 probability located in endoplasmic reticulum (membrane); 0.1000 
probability located in mitochondrial inner membrane 


SignaiP 
analysis: 


No Known Signal Sequence Predicted 



10 A search of the NOV9a protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 9D. 



Tabic 9D. Geneseq Results for NOV9a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV9a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 
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AAB70692 


Human DEC2a protein sequence SEQ 
ID NO:2 - Homo sapiens, 482 aa. 
[ WO200 1 1 455 1 - A 1 , 0 1 -MAR-200 1 ] 


I..482 
1..482 


482/482 (100%) 
482/482(100%) 


0.0 


AAB70693 


Human DEC2b protein sequence SEQ 
ID NO: 1 2 - Homo sapiens, 484 aa. 
[WO200I I4551-A1, 01-MAR-2001] 


I..482 
1..484 


482/484 (99%) 
482/484 (99%) 


0.0 


AAB70694 


Mouse DEC2a protein sequence SEQ 
I D NO: 1 4 - Mus musculus, 4 1 0 aa. 
[ WO200 1 14551-AI, 01-MAR-2001] 


1..482 
1 ..410 


350/484 (72%) 
371/484 (76%) 


e-178 


AAU16I88 


Human novel secreted protein, Seq ID 
1 141 - Homo sapiens, 165 aa. 
[WO2001 55322- A2, 02-AUG-2001] 


37..201 
1..165 


165/165(100%) 
165/165(100%) 


le-90 


AAU 16603 


Human novel secreted protein, Seq ID 

1 556 - Homo sapiens, 1 50 aa. 

[ WO200 1 55322-A2, 02-AUG-200 1 ] 


37..186 
1 .150 


148/150 (98%) 
148/150 (98%) 


9e-8l 

4 



In a BLAST search of public sequence datbases, the NOV9a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 9E. 



Tabic 9E. Public BLASTP Results for NOV9a 








Protein 

Accession 

Number 


Protcin/Organism/Length 


NOV9a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9C0J9 


Class B basic helix-loop-helix protein 3 
(bHLHB3) (Differentially expressed in 
chondrocytes protein 2) (hDEC2) 
(Enhancer-of-split and hairy-related 
protein 1) (SHARP- 1) - Homo sapiens 
(Human), 482 aa. 


1..482 
I. .482 


482/482 (100%) 
482/482 (100%) 


0.0 


Q8TAT1 


Basic helix-loop-helix domain 
containing, class B, 3 - Homo sapiens 
(Human), 482 aa. 


1..482 
1..482 


481/482 (99%) 
481/482 (99%) 


0.0 


Q99PV5 


Class B basic helix-loop-helix protein 3 
(bHLHB3) (Differentially expressed in 
chondrocytes protein 2) (mDEC2) - Mus 
musculus (Mouse), 410 aa. 


1..482 
I..4I0 


350/484 (72%) 
371/484 (76%) 


e-177 


035779 


Class B basic helix-loop-helix protein 3 
(bHLHB3) (Enhancer-of-split and hairy- 
related protein I) (SHARP- 1) - Rattus 
norvegicus (Rat), 410 aa. 


1..482 
1..4I0 


348/484 (71%) 
371/484 (75%) 


e-176 
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035185 


Class B basic helix-loop-helix protein 2 


14..401 


179/405 (44%) 


4e-66 




(bHLHB2) (Stimulated with retinoic acid 


20..364 


224/405 (55%) 






13) (E47 interaction protein 1) (eipl) - 










Mus musculus (Mouse), 411 aa. 






_ 



PFam analysis predicts that the NOV9a protein contains the domains shown in the 
Table 9F. 



Table 9F. Domain Analysis of NOV9a 


Pfam Domain 


NOV9a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


HLH 


48..100 


14/57(25%) 
39/57 (68%) 


0.0055 



5 



Example 10. 



The NOV 10 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table I0A. 

{Table 10A. NOV10 Sequence Analysis 




SEQIDNO: 39 j 1344 bp 


NOV 10a, 

CGI 06842-01 DNA 
Sequence 

i 

1 

i 
| 

i 

i 

! 


TCGTCCGCAAAGCCTGAGTCCTGTCCTTTCTCTCTCCCCGGACAGCATGAGCTTCACCACTCG 


CTCCACCTTCTCCACCAACTACCGGTCCCTGGGCTCTGTCCAGGCGCCCAGCTACGGCGCCCG 
GCCGGTCAGCAGCGCGGCCAGCGTCTATGCAGGCGCTGGGGGCCTGGCCACCGGGATAGCCGG 
GGGTCTGGCAGGAATGGGAGGCATCCAGAACGAGAAGGAGACCATGCAAAGCCTGAACGACCG 
CCTGGCCTCTTACCTGGACAGAGTGAGGAGCCTGGAGACCGAGAACCGGAGGCTGGAGAGCAA 
AATCCGGGAGCACTTGGAGAAGAAGGGACCCCAGGTCAGAGACTGGAGCCATTACTTCAAGAT 
CATCGAGGACCTGAGGGCTCAGATCTTCGCAAATACTGTGGACAATGCCCGCATCGTTCTGCA 
GATTGACAATGCCCGTCTTGCTGCTGATGACTTTAGAGTCAAGTATGAGACAGAGCTGGCCAT 
GCGCCAGTCTGTGGAGAACGACATCCATGGGCTCCGCAAGGTCATTGATGACACCAATATCAC 
ACG ACTG C AG CTGG AG AC AG AG AT CG AGG CTC T CAAGG AGG AG CT GCTCTTC ATG AAG AAG AA 
CCACGAAGAGGAAGTAAAAGGCCTACAAGCCCAGATTGCCAGCTCTGGGTTGACCGTGGAGGT 
AGATGCCCCCAAATCTCAGGACCTCGCCAAGATCATGGCAGACATCCGGGCCCAATATGACGA 
GCTGGCTCGGAAGAACCGAGAGGAGCTAGACAAGTACTGGTCTCAGCAGATTGAGGAGAGCAC 
CACAGTGGTCACCACACAGTCTGCTGAGGTTGGAGCTGCTGAGACGACGCTCACAGAGCTGAG 
ACGTACAGTCCAGTCCTTGGAGATCGACCTGGACTCCATGAGAAATCTGAAGGCCAGCTTGGA 
GAACAGCCTGAGGGAGGTGGAGGCCCGCTACGCCCTACAGATGGAGCAGCTCAACGGGATCCT 
GCTGCACCTTGAGTCAGAGCTGGCACAGACCCGGGCAGAGGGACAGCGCCAGGCCCAGGAGTA 
TGAGGCCCTGCTGAACATCAAGGTCAAGCTGGAGGCTGAGATCGCCACCTACCGCCGCCTGCT 
GGAAGATGGCGAGGACTTTAATCTTGGTGATGCCTTGGACAGCAGCAACTCCATGCAAACCAT 
CCAAAAGACCACCACCCGCCGGATAGTGGATGGCAAAGTGGTGTCTGAGACCAATGACACCAA 
AGTTCTGAGGCATTAAGCCAGCAGAAGCAGGGTACCCTTTGGGGAGCAGGAGGCCAATAAAAA 


GTTCAGAGTTCATTGGATGTC 




ORF Start: ATG at 47 jORF Stop: TAA at 1 274 




SEQIDNO: 40 409 aa MW at 46045. lkD 


NOV 10a, 

CGI 06842-01 Protein 
Sequence 


MSFTTRSTFSTNYRSLGSVQAPSYGARPVSSAASVYAGAGGLATGIAGGLAGMGGIQNEKETM 
QSLNDRLASYLDRVRSLETENRRLESKIREHLEKKGPQVRDWSHYFKIIEDLRAQIFANTVDN 
ARIVLQIDNARLAADDFRVKYETELAMRQSVENDIHGLRKVIDDTNITRLQLETEIEALKEEL 
LFMKKNHEEEVKGLOAQIASSGLTVEVDAPKSQDLAKIMADIRAQYDELARKNREELDKYWSQ 
QIEESTTWTTQSAEVGAAETTLTELRRTVQSLEIDLDSMRNLKASLENSLREVEARYALQME 
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NOV I Ob, 

CGI 06842-02 DNA 
Sequence 



QLNGILLHLESELAQTRAEGQRQAQEYEALLNIKVKLEAEIATYRRLLEDGEDFNLGDALDSS 
NSMQ TIQKTTTRRIVDGKWSETNDTKVLRH 

SEQ'lDNO:41 j 1412 bp j 



CGGGGTCGTCCGCAAAGCCTGAGTCCTGTCCTTTCTCTCTCCCCGGACAG CATGAGCTTCACC 



ACTCGCTCCACCTTCTCCACCAACTACCGGTCCCTGGGCTCTGTCCAGGCGCCCAGCTACGGC 
GCCCGGCCGGTCAGCAGCGCGGCCAGCGTCTATGCAGGCGCTGGGGGCTCTGGTTCCCGGATC 
TCCGTGTCCCGCTCCACCAGCTTCAGGGGCGGCATGGGGTCCGGGGGCCTGGCCACCGGGATA 
GCCGGGGGTCTGGCAGGAATGGGAGGCATCCAGAACGAGAAGGAGACCATGCAAAGCCTGAAC 
GACCGCCTGGCCTCTTACCTGGACAGAGTGAGGAGCCTGGAGACCGAGAACCGGAGGCTGGAG 
AGCAAAATCCGGGAGCACTTGGAGAAGAAGGGACCCCAGGTCAGAGACTGGAGCCATTACTTC 
AAGATCATCGAGGACCTGAGGGCTCAGATCTTCGCAAATACTGTGGACAATGCCCGCATCGTT 
CTGCAGATTGACAATGCCCGTCTTGCTGCTGATGACTTTAGAGTCAAGTATGAGACAGAGCTG 
GCCATGCGCCAGTCTGTGGAGAACGACATCCATGGGCTCCGCAAGGTCATTGATGACACCAAT 
ATCACACGACTGCAGCTGGAGACAGAGATCGAGGCTCTCAAGGAGGAGCTGCTCTTCATGAAG 
AAGAACCACGAAGAGGAAGTAAAAGGCCTACAAGCCCAGATTGCCAGCTCTGGGTTGACCGTG 
GAGGTAGATGCCCCCAAATCTCAGGACCTCGCCAAGATCATGGCAGACATCCGGGCCCAATAT 
GACGAGCTGGCTCGGAAGAACCGAGAGGAGCTAGACAAGTACTGGTCTCAGCAGATTGAGGAG 
AGCACCACAGTGGTCACCACACAGTCTGCTGAGGTTGGAGCTGCTGAGACGACGCTCACAGAG 
CTGAGACGTACAGTCCAGTCCTTGGAGATCGACCTGGACTCCATGAGAAATCTGAAGGCCAGC 
TTGGAGAACAGCCTGAGGGAGGTGGAGGCCCGCTACGCCCTACAGATGGAGCAGCTCAACGGG 
ATCCTGCTGCACCTTGAGTCAGAGCTGGCACAGACCCGGGCAGAGGGACAGCGCCAGGCCCAG 
GAGTATGAGGCCCTGCTGAACATCAAGGTCAAGCTGGAGGCTGAGATCGCCACCTACCGCCGC 
CTGCTGG AAG ATGGCG AGG AC TTTAATCT TGG TG ATG CCTTGG AC AG C AGC AA CT C C ATGC AA 
ACCATCCAAAAGACCACCACCCGCCGGATAGTGGATGGCAAAGTGGTGTCTGAGACCAATGAC 
ACCAAAGTTCTGAGGCATTAAGCCAGCAGAAGCAGGGTACCCTTTGGGGAGCAGGAGGCCAAT 



AAAAAGTT C AG AGTTC ATTGG ATGTC 



ORF Start: ATG at 52 



JORFStop:TAA at 1342 



SEQ ID NO: 42 



430 aa 



MWat48057.2kD 



NOV 10b, 

CGI 06842-02 Protein 
Sequence 



MSFTTRSTFSTNYRSLGSVQAPSYGARPVSSAASVYAGAGGSGSRISVSRSTSFRGGMGSGGL 
ATGIAGGLAGMGGIQNEKETMQSLNDRLASYLDRVRSLETENRRLESKIREHLEKKGPQVRDW 
SHYFKIIEDLRAQIFANTVDNARIVLQIDNARLAADDFRVKYETELAMRQSVENDIHGLRKVI 
DDTNITRLQLETEIEALKEELLFMKKNHEEEVKGLQAQIASSGLTVEVDAPKSQDLAKIMADI 
RAQYDELARKNREELDKYWSQQIEESTTWTTQSAEVGAAETTLTELRRTVQSLEIDLDSMRN 
LKASLENSLREVEARYALQMEQLNGILLHLESELAQTRAEGQRQAQEYEALLNIKVKLEAEIA " 
TYRRLLEDGEDFNLGDALDSSNSMQTIQKTTTRRIVDGKWSETNDTKVLRH 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 1 0B. 



Table 10B. Comparison of NOVlOa against NOVlOb. 


0 , . c ! NOVlOa Residues/ 
Protein Sequence : , . ~ . . 

^ Match Residues 


Identities/ 

Similarities Tor the Matched Region 


NOV 1 Ob jl..409 
| I..430 


381/430 (88%) 
381/430 (88%) 



5 



Further analysis of the NOVlOa protein yielded the following properties shown in 
Table IOC. 
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Table IOC Protein Sequence Properties NOVlOa 


PSort 
analysis: 


0.8477 probability located in mitochondrial intermembrane space; 0.7065 probability 
located in mitochondrial matrix space; 0.3907 probability located in mitochondrial 
inner membrane; 0.3907 probability located in mitochondrial outer membrane 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOVlOa protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 10D. 

5 

Table 10D. Geneseq Results for NOVlOa 



Geneseq 
Identifier 


Protein/Organism/Lcngth [Patent #, 
Date] 


NOVlOa 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB90795 


Human shear stress-response protein 
SEQ ID NO: 90 - Homo sapiens, 430 
aa. [WO200125427-A1, 12-APR-2001] 


1 ..409 
1..430 


409/430 (95%) 
409/430 (95%) 


0.0 


AAG74328 


Human colon cancer antigen protein 
SEQ ID NO:5092 - Homo sapiens, 452 
aa. [WO200122920-A2, 05-APR-2001] 


I ..409 
23. .452 


409/430 (95%) 
409/430 (95%) 


0.0 


ABG16550 


Novel human diagnostic protein 
# 1 654 1 - Homo sapiens, 447 aa. 
[WO2001 75067- A2, 1 l-OCT-2001] 


1..409 
I8..447 


409/430 (95%) 
409/430 (95%) 


0.0 


ABG 15224 


Novel human diagnostic protein 
#15215 - Homo sapiens, 456 aa. 
[WO200175067-A2, 1 l-OCT-2001] 


I. .409 
24. .456 


388/433 (89%) 
397/433 (91%) 


0.0 


ABG08564 


Novel human diagnostic protein #8555 

- Homo sapiens, 449 aa. 

[ WO200 1 75067- A2, 1 1 -OCT-200 1 ] 


1..409 
16..449 


392/434 (90%) 
394/434 (90%) 


0.0 


In a BLAST search of public sequence datbases, the NOVlOa protein was found to 
have homology to the proteins shown in the BLASTP data in Table I0E. 


| Table 10E. Public BLASTP Results for NOVlOa 


1 

j Protein 
Accession 
Number 


Protein/Orga n ism/Length 


NOVlOa 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 
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S05481 


keratin 18, type I, cytoskeletal - human. 
430 aa. 




I..409 
1 ..430 


409/430 (95%) 
409/430 (95%) 


0.0 


, P05783 

1 


Keratin, type 1 cytoskeletal 18 
(Cytokeratin 18)(K18)(CK 18)- 
Homo sapiens (Human), 429 aa. 


2..409 
1 ..429 


408/429 (95%) 
408/429 (95%) 


0.0 


;Q96GD2 

i 


Similar to keratin 1 8 - Homo sapiens 
(Human), 375 aa (fragment). 


38..409 
4..37S 


371/372 (99%) 
372/372 (99%) 


0.0 


j P05784 

i 

i 

i 


Keratin, type I cytoskeletal 1 8 
(Cytokeratin 18) (Cytokeratin endo B) 
(Keratin D) - Mus musculus (Mouse), 
422 aa. 


2.. 409 
1..422 


359/422 (85%) 
385/422 (91%) 


0.0 


1 159463 




keratin, type 1, cytoskeletal - mouse, 
423 aa. 


1..409 
1 ..423 


358/423 (84%) 
385/423 (90%) 


0.0 



PFam analysis predicts that the NOV 1 0a protein contains the domains shown in the 
Table I OF. 



j Table 10F. Domain Analysis of NOVlOa 


} 
i 




Identities/ 




• Pfam Domain 


NOVlOa Match Region 


Similarities 


Expect Value 


1 




for the Matched Region 


r ■■ 

[ filament 


58..370 


158/360 (44%) 
283/360 (79%) 


4.4e-157 



5 



Example 11. 

The NOV1 1 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 1 1 A. 



jTable 1 1 A. NOVll Sequence Analysis 


i 
i 
» 


SEQIDNO:43 


1 1 97 bp j 


iNOVlla, 

! CG 107095-01 DNA 
■Sequence 

i 
1 

I 
i 

t 

i 

! 

i 
i 

! 


CCACTGGGACATATGTGGTGTTCCTTCCTAGCTCCTGTCTCCTCCTCATGCCTTTGTTGGGTA 


TGGGCATGTTAGGGGGAAGGTCATTGCTGTCAGAGGGGCACTGACTTTCTAATGGTGTTACCC 


AAGGTGAATGTTGGAGACACAGTCGCGATGCTGCCCAAGTCCCGGCGAGCCCTAACTATCCAG 
GAGATCGCTGCGCTGGCCAGGTCCTCCCTGCATGGTATTTCCCAGGTGGTGAAGGACCACGTG 
ACCAAGCCTACCGCCATGGCCCAGGGCCGAGTGGCTCACCTCATTGAGTGGAAGGGCTGGAGC 
AAGCCGAGTGACTCACCTGCTGCCCTGGAATCAGCCTTTTCCTCCTATTCAGACCTCAGCGAG 
GGCGAACAAGAGGCTCGCTTTGCAGCAGGAGTGGCTGAGCAGTTTGCCATCGCGGAAGCCAAG 
CTCCGAGCATGGTCTTCGGTGGATGGCGAGGACTCCACTGATGACTCCTATGATGAGGACTTT 
GCTGGGGGAATGGACACAGACATGGCTGGGCAGCTGCCCCTGGGGCCGCACCTCCAGGACCTG 
TTCACCGGCCACCGGTTCTCCCGGCCTGTGCGCCAGGGCTCCGTGGAGCCTGAGAGCGACTGC 
TC ACAG AC C ATGT CCCCAG ACACCCTGTG CT CT AGTC TGTG C AG CCTGG AGG ATGGGT TGT TG 
GGCTCCCCGGCCCGGCTGGCCTCCCAGCTGCTGGGCGATGAGCTGCTTCTCGCCAAACTGCCC 
CCCAGCCGGGAAAGTGCCTTCCGCAGCCTGGGCCCACTGGAGGCCCAGGACTCACTCTACAAC 
TCGCCCCTCACAGAGTCCTGCCTTTCCCCCGCGGAGGAGGAGCCAGCCCCCTGCAAGGACTGC 
C AGC C ACTCTG CC C ACC ACT AACGGG C AG CTG GG AACGGC AG CGG C AAGC CT C T G ACC TG G C C 
TCTTCTGGGGTGGTGTCCTTAGATGAGGATGAGGCAGAGCCAGAGGAACAGTGACCCACATCA 
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1 

j 

i 


TGCCTGGACAGTGACCCACATCATGCCTGGACAGTGACCCACATCATGCCTGGACAGTGACCC 


ACATCATCCTGGACAGTGACCCACATCATGCCTGGACAGTGACCCACATCATGCCTGGACAGT 


GACCCACATCATGCCTGGACAGTGACCCACATCATCCTGGACAGTGACCCACATGATGCCTGG 


! 


ORF Start: ATG at 1 15 j [ORF Stop: TGA at 997 


i 


SEQ ID NO: 44 |294 aa MW at 3 1 502.5kD 


: NOV11a, 

|CG 107095-01 Protein 
jSequence 

i 


MVLPKVNVGDTVAMLPKSRRALTIQEIAALARSSLHGISQWKDHVTKPTAMAQGRVAHLIEW 
KGWSKPSDSPAALESAFSSYSDLSEGEQEARFAAGVAEQFAIAEAKLRAWSSVDGEDSTDDSY 
DEDFAGGMDTDMAGQLPLGPHLQDLFTGHRFSRPVRQGSVEPESDCSQTMSPDTLCSSLCSLE 
DGLLGSPARLASQLLGDELLLAKLPPSRESAFRSLGPLEAQDSLYNSPLTESCLSPAEEEPAP 
CKDCQPLCPPLTGSWERQRQASDLASSGWSLDEDEAEPEEQ 



Further analysis of the NOV1 la protein yielded the following properties shown in 
Table MB. 



Table 11 B. Protein Sequence Properties NOVlla 


PSort 
' analysis: 

1 


0.3600 probability located in mitochondrial matrix space; 0.3000 probability located 
in microbody (peroxisome); 0.1000 probability located in lysosome (lumen); 0.0000 
probability located in endoplasmic reticulum (membrane) 


SignalP 
! analysis: 


No Known Signal Sequence Predicted 



5 



A search of the NOV I la protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 1 1C. 



Table 11 C. Geneseq Results for NOVlla 


Gcncscq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOVlla 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAU81979 


Human secreted protein SECP5 - 
Homo sapiens, 328 aa. 
[WO200198353-A2, 27-DEC-2001] 


1..294 
3S..328 


293/294 (99%) 
294/294 (99%) 


e-169 


AAM79I22 


Human protein SEQ ID NO 1784 - 

Homo sapiens, 366 aa. 

[ WO200 1 57 1 90- A2, 09- A UG-200 1 ] 


3.. 294 
75.366 


290/292 (99%) 
292/292 (99%) 


e-167 


AAG813I3 

i 


Human AFP protein sequence SEQ ID 
NO: 144 - Homo sapiens, 281 aa. 
[WO200I29221-A2, 26-APR-2001] 


14..294 
I..281 


279/281 (99%) 
280/281 (99%) 


e-161 


1 

AAM80I06 


Human protein SEQ ID NO 3752 - 
Homo sapiens, 382 aa. 
[WO200I57190-A2, 09-AUG-2001] 


3..294 
91. .382 


267/292 (91%) 
272/292 (92%) 


e-153 
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AAB66099 

i 


Protein of the invention #11- 
Unidentifled, 335 aa. [WO20007896I- 
AI,28-DEC-2000] 


37..294 
78..335 


257/258 (99%) 
258/258 (99%) 


e-149 


In a BLAST search of public sequence datbases, the NOV1 la protein was found to 
have homology to the proteins shown in the BLASTP data in Table 1 ID. 


Table 11D. Public BLASTP Results for NOVlla 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOVlla 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q8TA84 


Similar to unknown (Protein for 
IMAGE:3487926) - Homo sapiens 
(Human), 281 aa. 


I4..294 
I..281 


280/281 (99%) 
281/281 (99%) 


e- 1.61 


CAC38563 


Sequence 143 from Patent 
WOO 129221 - Homo sapiens 
(Human), 281 aa. 


14..294 
1..281 


279/281 (99%) 
280/281 (99%) 


e-160 


Q99KY5 


Hypothetical 16.1 kDa protein - Mus 
musculus (Mouse), 150 aa 
(fragment). 


143..290 
2..149 


130/148 (87%) 
138/148 (92%) 


4e-71 


Q9D390 


6330503C03Rik protein - Mus 
musculus (Mouse), 300 aa. 


I0..290 
3..297 


120/318 (37%) 
159/318(49%) 


6e-38 


094871 


KIAA0773 protein - Homo sapiens 
(Human), 300 aa. 


10..290 
3..297 


120/309 (38%) 
160/309 (50%) 


2e-37 



5 ' 

PFam analysis predicts that the NOV1 la protein contains the domains shown in the 
Table 1 IE. 



! Table HE. Domain Analysis of NOVlla 



Pfam Domain 

1 


NOVlla Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


lad 


20..47 


7/28 (25%) 
18/28 (64%) 


0.68 



10 Example 12. 

The NOV 12 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 12A. 
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Table 12A. NOV12 Sequence Analysis 




SEQIDNO:45 |538 bp 


NOV 12a, 

^UIU/4//-Ul UiNA 

Sequence 


CATCTGAAGCCAAAGAGATAAACTTTATGCCCAGATTCCCCCTATAGAGAAGATGGATGCATC 


CTTGTCCATGCTTGCTAATTGCGAGAAGCTTTCACTGTCTACAAACTGCATTGAAAAAATTGC 
CAACCTGAATGGCTTAAAAAACTTGAGGATATTATCTTTAGGAAGAAACAACATAAAGAACTT 
AAATGGACTGGAGGCTGTAGGGGACACATTAGAAGAACTGTGGATCTCCTACAATTTTATTGA 
GAAGTTGAAAGGGATCCACATAATGAAGAAATTGAAGATTCTCTACATGTCTAATAACCTGGT 
AAAAGACTGGGCTGAGTTTGTGAAGTTGGCAGAACTGCCATGCCTCGAAGACCTGGTGTTTGT 
AGGCAATCCCTTGGAAGAGAAACATTCTGCTGAGAATAACTGGATTGAAGAAGCAACCAAGAG 
AGTGCCCAAACTGAAAAAGCTGGATGGTACTCCAGTAATTAAAGGGGATGAGGAAGAAGACAA 
CTAATGCCACGCTTTCCACATGTGTGTTAACTTA 




ORF Start: ATG at 53 | ORF Stop: TAA at 506 




bby i\J INU. ho JIM aa M w at 1 7095. okD 


NO VI 2a, 

CGI 07477-01 Protein 
Sequence 


MDASLSMLANCEKLSLSTNCIEKIANLNGLKNLRILSLGRNNIKNLNGLEAVGDTLEELWISY 
NFIEKLKGIHIMKKLKILYMSNNLVKDWAEFVKLAELPCLEDLVFVGNPLEEKHSAENNWIEE 
ATKRVPKLKKLDGTPVIKGDEEEDN 




SEQ IDNO:47 |633 bp 


NOV 12b, 

CGI 07477-02 DNA 
Sequence 

i 


AGTAGCAACCGCCGGAATGGCGAAAGCAACAACAATCAAAGAAGCCTTAGCGAGATGGGAAGA 
GAAAACTGGCCAGAGGCCATCTGAAGCCAAAGAGATAAAACTTTATGCCCAGATTCCCCCTAT 
AG AG AAG ATGG ATGC ATC CTT GT CC ATG CTTGCTAATTGCG AG AAGCTTT CAC TG T C T AC AAA 
CTGCATTGAAAAAATTGCCAACCTGAATGGCTTAAGAGGCAGTAGGGGACACATTAGAAGAAC 
TGTGGATCTCCTACAATTTTATTGAGAAGTTGAAAGGGATCCACATAATGAAGAAATTGAAGA 


TTCTCTACATGTCTAATAACCTGGTAAAAGACTGGGCTGAGTTTGTGAAGCTGGCAGAACTGC 


CATG CCTCG AAG AC CTGGTGT TTGT AGG C AATCC C TTGG AAG AGAAAC AT TCT G CTG AG AAT A 


ACTGGATTGAAGAAGCAACCAAGAGAGTGCCCAAACTGAAAAAGCTGGATGGTACTCCAGTAA 


TTAAAGGGGATGAGGAAGAAGACAACTAATGCCACGCTTTCCACTGTGTGTTAACTTATTTAA 


ATGTCATAAGAACAATAGATAAATTTTATATAATTGTCTATTTTAAAAAAAAAAAAAAAAAAA 


AAA 




ORF Start: ATG at 1 7 | |ORF Stop: TGA at 275 




SEQIDNO:48 !86 aa |MW at 9691.1 kD 


NO VI 2b, 

CGI 07477-02 Protein 
Sequence 


MAKATT I KEALAR WEEKTGQRPS E AKE I KLYAQ I PP I EKMDASLSMLANCEKLSLSTNC I EK I 
ANLNGLRGSRGHIRRTVDLLQFY 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 12B. 



1 - - - - - — — ' 

! Table 12B. Comparison of NOV12a against NOV12b. 


i 

Protein Sequence 


NOV12a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


; NOV 12b 

1 


I..34 
40..73 


31/34 (91%) 
32/34 (93%) 



5 



Further analysis of the NOV 12a protein yielded the following properties shown in 
Table 12C. 
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j Table 12C. Protein Sequence Properties NOV12a 


PSort 
analysis: 


0.4859 probability located in mitochondrial matrix space; 0.4500 probability located 
in cytoplasm; 0.1967 probability located in mitochondrial inner membrane; 0.1967 
probability located in mitochondrial intermembrane space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV 1 2a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 12D. 

5 ; 

j Table 12D. Geneseq Results for NOV12a 



1 
! 

Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV12a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAU74331 


Human cytoskeleton-associated protein 
(CYSKP) #2 - Homo sapiens, 190 aa. 
[WO2001 85942 -A2, 15-NOV-2001] 


1..I51 
40.. 190 


151/151 (100%) 
151/151 (100%) 


2e-83 


AAB41987 


Human ORFX ORF1751 polypeptide 
sequence SEQ ID NO:3502 - Homo 
sapiens, 196 aa. [WO200058473-A2, 
05-OCT-2000] 


1..151 
46.. 196 


151/151 (100%) 
151/151 (100%) 


2e-83 


ABG 12773 


Novel human diagnostic protein 
#12764 - Homo sapiens, 144 aa. 
[WO200175067-A2, ll-OCT-2001] 


13. .151 
6..I44 


139/139(100%) 
139/139(100%) 


2e-76 


ABB59218 


Drosophila melanogaster polypeptide 
SEQ ID NO 4446 - Drosophila 
melanogaster, 188 aa. [WO200171042- 
A2,27-SEP-2001] 


1.-149 
40.. 187 


85/149 (57%) 
113/149 (75%) 


7e-44 


AAG00733 


Human secreted protein, SEQ ID NO: 
4814 - Homo sapiens, 1 13 aa. 
[EP1033401-A2, 06-SEP-2000] 


1 ..87 
27..I13 


87/87 (100%) 
87/87(100%) 


2e-43 



In a BLAST search of public sequence datbases, the NOV 1 2a protein was found lo 
have homology to the proteins shown in the BLASTP data in Table 12E. 



Table I2E. Public BLASTP Results for NOV12a 








Protein 

Accession 

Number 


Protein/Organism/Length 


NOV12a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 
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Q9BS43 


Similar to RIKEN cDNA 

1 7000 1 OH 1 5 gene - Homo sapiens 

(Human), 151 aa. 


1..151 
1..I5I 


151/151 (100%) 
151/151 (\00%) 


! 5e-83 

; i 

! 

. -L_ 


Q9DAH9 


)7000I0H15Rik protein -Mus 
musculus (Mouse), 151 aa. 


1..151 
1..151 


140/151 (92%) 
146/151 (95%) 


j 3e-77 

1 
i 
i 


Q8T888 


Leucine-rich repeat dynein light 
chain - Ciona intestinalis, 190 aa. 


1-151 
40.. 190 


120/151 (79%) 
132/151 (86%) 


\ 2e-64 j 

! ! 


044230 


Outer arm dynein light chain 2 - 
Anthocidaris crassispina (Sea 
urchin), 199aa. 


1..151 
47.. 197 


110/151 (72%) 
129/151 (84%) 


! le-57 

1 

i 

! 


Q9V573 


CG8800 protein - Drosophila 
melanogaster (Fruit fly), 1 88 aa. 


1..149 
40..187 


85/149 (57%) 
113/149 (75%) 


j 2e-43 



PFam analysis predicts that the NOV12a protein contains the domains shown in the 
Table I2F. 



Table 12F. Domain Analysis of NOV12a 






Pfam Domain 


NOV 12a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 





5 



Example 13. 

The NOV 13 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 13 A. 



Table 13A. NOV13 Sequence Analysis 


|SEQIDNO:49 ]l644 bp 




NOV 13a, 

CGI 08707-01 DNA 
Sequence 


GATATCCCGAGATTAGGTCCCCAGCTTCCAAAGAGAGGATCAGAATGTCTCAGGATAATGACA 


CATTGATGAGAGACATCCTGGGGCATGCGCTCGCTGCTATGAGGCTGCAGAAGCTGGAACAGC 
AG CGGCGGCTGTTTG AAAAG AAG C AG CG AC AG AAGCG CC AGG AGC T C C TC ATG G T TC AGG C C A 
ATCCTGACGCTTCCCCGTGGCTTTGGCGCTCTTGTCTGCGGGAGGAGCGCCTTTTAGGTGACA 
GAGGCCTTGGGAACCCTTTCCTCCGGAAGAAAGTGTCAGAGGCACATCTGCCCTCTGGCATCC 
ACAGTGCCCTGGGCACCGTGAGCTGTGGTGGAGACGGCAGGGGCGAGCGCGGCCTCCCGACAC 
CGCGGACAGAAGCAGTGTTCAGGAATCTCGGTCTCCAGTCCCCTTTCTTATCCTGGCTCCCAG 
ACAATTCCGATGCAGAATTGGAGGAAGTCTCCGTGGAGAATGGTTCCGTCTCTCCCCCACCTT 
TTAAACAGTCTCCGAGAATCCGACGCAAGGGTTGGCAAGCCCACCAACGACCTGGGACCCGTG 
CAGAGGGTGAGAGTGACTCCCAGGATATGGGAGATGCACACAAGTCACCCAATATGGGACCAA 
ACCCTGGAATGGATGGTGACTGTGTATATGAAAACTTGGCCTTCCAAAAGGAAGAAGACTTGG 
AAAAGAAGAGAGAGGCCTCTGAGTCTACAGGGACGAACTCCTCAGCAGCACACAACGAAGAGT 
TGTCCAAGGCCCTGAAAGGCGAGGGTGGCACGGACAGCGACCATATGAGGCACGAAGCCTCCT 
TGGCAATCCGCTCCCCCTGCCCTGGGCTGGAGGAGGACATGGAAGCCTACGTGCTGCGGCCAG 
CGCTCCCGGGCACCATGATGCAGTGCTACCTCACCCGTGACAAGCACGGCGTGGACAAGGGCT 
TGTTCCCC CT CT ACT ACCTC T AC CTGG AG ACCTCTG ACAG CCTGC AG CG C TC CCT C CTGGC TG 
GGCG AAAG AG AAG AAGG AGC AAAACTTCT AATT ACC TC ATCTCCCTGG AT CC T AC A C AC C T AT 
CTCGGGACGGGGACAATTTCGTGGGCAAAGTCAGATCCAATGTCTTCAGCACCAAGTTCACCA 
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i 
i 

i 
I 

: 
i 


TCTTTGACAATGGGGTGAATCCTGACCGGGAGCATTTAACCAGGAATACTGCCCGGATCAGAC 
AGGAGCTGGGGGCTGTGTGTTATGAGCCCAACGTCTTAGGATACCTGGGGCCTCGGAAAATGA 
CTGTGATTCTCCCAGGAACCAACAGCCAGAACCAGCGAATCAATGTCCAGCCACTAAATGAAC 
AGGAGTCGCTACTGAGTCGTTACCAACGTGGGGACAAACAAGGGTTGCTTTTGTTGCACAACA 
AAACCCCGTCGTGGGACAAGGAGAACGGTGTCTACACGCTCAATTTCTATGGTCGAGTCACTC 
GGGCTTCGGTGAAGAACTTCCAAATCGTGGATCCCAAACACCGTGAGCTCCTGGAAACCAGTT 
TAGCCGGGCCAGAAGAACATCTGGTGCTCCAGTTCGGCCGAGTGGGCCCAGACACATTCACCA 
TGGACTTCTGCTTTCCATTTAGCCCGCTCCAGGCCTTCAGCATCTGCTTGTCCAGTTTCAATT 
AGAAGC 


i 


ORF Start: ATG at 45 J |ORF Stop: TAG at 1 638 


f 


SEQ ID NO: 50 53 1 aa |MW at 59739.7kD 


'NOV 1 3a, 

!CG 108707-01 Protein 
jSequence 

i 

i 


MSQDNDTLMRDILGHALAAMRLQKLEQQRRLFEKKQRQKRQELLMVQANPDASPWLWRSCLRE 
ERLLGDRGLGNPFLRKKVSEAHLPSGIHSALGTVSCGGDGRGERGLPTPRTEAVFRNLGLQSP 
FLSWLPDNSDAELEEVSVENGSVSPPPFKQSPRIRRKGWQAHQRPGTRAEGESDSQDMGDAHK 
SPNMGPNPGMDGDCVYENLAFQKEEDLEKKREASESTGTNSSAAHNEELSKALKGEGGTDSDH 
MRHEASLAIRSPCPGLEEDMEAYVLRPALPGTMMQCYLTRDKHGVDKGLFPLYYLYLETSDSL 
QRSLLAGRKRRRSKTSNYLISLDPTHLSRDGDNFVGKVRSNVFSTKFTIFDNGVNPDREHLTR 
NTARIRQELGAVCYEPNVLGYLGPRKMTVILPGTNSQNQRINVQPLNEQESLLSRYQRGDKQG 
LLLLHNKTPSWDKENGVYTLNFYGRVTRASVKNFQIVDPKHRELLETSLAGPEEHLVLQFGRV 
GPDTFTMDFCFPFSPLQAFSICLSSFN 



Further analysis of the NOV1 3a protein yielded the following properties shown in 
Table 13B. 



j Tabic 13B. Protein Sequence Properties NOV13a 


j PSort 
I analysis: 

i 

i 


0.6000 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


j SignalP 
i analysis: 


No Known Signal Sequence Predicted 



5 



A search of the NOV 1 3a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table I3C. 



i Table 13C. Gencscq Results for NOV13a 



1 - 

• Geneseq 
j Identifier 

i 


Protein/Organism/Length [Patent 
#,Datc] 


NOV13a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


[AAB26906 

! 


Human TULP2 protein - Homo 
sapiens, 520 aa. [US61 14502-A, 05- 
SEP-2000] 


1..53I 
1..520 


515/531 (96%) 
517/531 (96%) 


0.0 


AAW36491 


Human TULP2 protein - Homo 
sapiens, 520 aa. [WO9738004-A1, 
16-OCT-1997] 


1..53I 
1..520 


515/531 (96%) 
517/531 (96%) 


0.0 
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jAAB26908 


Mouse TULP4 protein - Mus sp, 506 
aa. [US61 14502-A, 05-SEP-2000] 


45. .531 
I..500 


267/516(51%) 
326/516(62%) 


e-126 


|AAW36494 

i 
I 


Human TULP4 protein - Homo 
sapiens, 506 aa. [WO9738004-A 1 . 
16-OCT-1997] 


45..53 1 
1.500 


267/516(51%) 
326/516(62%) 


e-126 


jAAB26901 

i 


Mouse tub Form II protein - Mus sp, 
505 aa. [US61 14502-A, 05-SEP- 
2000] 


5..53I 
7..499 


200/547 (36%) 
299/547 (54%) 


le-86 



In a BLAST search of public sequence datbases, the NOV 13a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 13D. 



j Table 13D. Public BLASTP Results for NOV13a 


i 

j Protein 
! Accession 

i Number 

i 


Protein/Organism/Length 


NOV13a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


| Q8TC50 


Tubby like protein 2 - Homo sapiens 
(Human), 520 aa. 


1..531 
I..520 


516/531 (97%) 
518/531 (97%) 


0.0 


1 000295 

! 
1 

I 

) 


Tubby related protein 2 (Tubby-like 
protein 2) - Homo sapiens (Human), 
520 aa. 


I..531 
I..520 


515/531 (96%) 
517/531 (96%) 


0.0 


! P46686 

» 

j 


Tubby related protein 2 (Tubby-like 
protein 2) (P4-6 protein) - Mus 
musculus (Mouse), 564 aa (fragment). 


2. .531 
I6..558 


292/559 (52%) 
359/559 (63%) 


e-140 


. S42728 

s 


phosphodiesterase (clone p4-6) - 
mouse, 271 aa. 


262..531 
7..265 


175/270 (64%) 
205/270 (75%) 


4e-95 


P50586 

L . . 


Tubby protein - Mus musculus 
(Mouse), 505 aa. 


5..53I 
7..499 


200/547 (36%) 
299/547 (54%) 


3e-86 
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PFam analysis predicts that the NOV 1 3a protein contains the domains shown in the 
Table 1 3E. 



Table 13E. Domain Analysis of NOV13a 


I 

Pfam Domain 


NOV13a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


Tub 


279..531 


127/307 (41%) 
238/307 (78%) 


le-212 
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Example 14. 

The NOV 1 4 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 14 A. 



Table 14 A. NOV14 Sequence Analysis 





SEQIDNO:51 |3937 bp 




NOV 14a, 

CGI 0879 1-01 DNA 
Sequence 


J TGCGCTGACAGCAGCCATGGCGAGCGGCAGTGGAGACAGCGTCACCCGTCGGAGCGTGGCATC 
AC AGTTTTT C ACTC AAG AGG AGG GGC CGGG C ATCG ATGGCATG AC C AC CTCAG AG AGGGTG G T 
GGATCTTCTGAACCAGGCGGTGCTGATCACCAATGACTCAAAGATCACAGTGCTCAAACAGGT 
CCAGGAGCTGATCATCAACAAAGACCCCACACTACTGGACAACTTCCTGGATGAGATCATCGC 
ATTCCAAGCAGACAAGTCAATCGAAGTGCGAAAATTTGTCATCGGCTTCATCGAGGAGGCATG 
CAAGCGAGACATCGAGTTGCTGCTGAAACTCATTGCAAACCTCAACATGCTCTTGAGGGACGA 
GAATGTGAACGTGGTGAAGAAGGCTATCCTCACCATGACCCAGCTCTACAAGGTGGCCCTGCA 
GTGGATGGTAAAGTCACGGGTCATTAGCGAGCTACAGGAGGCCTGCTGGGACATGGTATCTGC 
CATGGCGGGGGACATCATCCTGCTATTGGACTCTGACAATGACGGCATCCGCACCCACGCCAT 
CAAGTTTGTGGAGGGCCTCATTGTCACCCCGTCACCCCGCATGGCTGACTCAGAGATACCCCG 
ACGCCAGGAGCATGATATCAGCCTGGACCGCATCCCTCGTGACCACCCCTACATCCAGTACAA 
CGTGCTATGGGAAGAGGGCAAGGCAGCCTTGGAGCAGCTGCTTAAGTTCATGGTGCACCCTGC 
CATCTCCTCCATCAACCTGACCACAGCGCTGGGCTCCCTTGCCAATATCGCCCGCCAGAGACC 
CATGTTCATGTCTGAGGTGATCCAGGCCTATGAAACTCTGCATGCCAACCTGCCCCCGACGCT 
GGCCAAATCGCAGGTGAGCAGTGTGCGTAAGAATCTGAAGCTGCACCTGTTGAGTGTGCTGAA 
GCACCCGGCTTCCTTGGAGTTCCAGGCCCAGATCACCACCCTGCTGGTGGACCTGGGCACACC 
TCAGGCCGAGATCGCCCGCAACATGCCGAGCAGCAAGGACACCCGCAAGCGGCCCCGCGATGA 
CTCGGACTCCACACTCAAGAAGATGAAGCTGGAGCCCAACCTGGGGGAGGACGATGAGGACAA 
AGACTTGGAGCCAGGCCCGTCGGGGACCTCGAAGGCCTCAGCGCAGATCTCCGGCCAGTCAGA 
CACGGACATCACAGCTGAGTTCCTGCAGCCTCTGCTGACGCCTGATAATGTGGCTAATCTGGT 
CCTCATCAGCATGGTGTACCTACCCGAGGCCATGCCAGCCTCCTTCCAGGCCATCTACACCCC 
CGTGGAGTCAGCAGGCACGGAAGCCCAGATCAAGCACCTGGCTCGGCTCATGGCCACACAGAT 
G AC AG CT GCCGG ACTGGG ACC AG GTGT AG AGC AG AC C AAAC AGTG C AAGG AGG AG C C C AAG G A 
GG AG AAG GTGGTG AAG AC AG AG AGCGTCCTG ATC AAGCG G CG CCTGTC AG CCC AGGG C C AAGC 
CATCTCGGTGGTGGGTTCCCTGAGCTCCATGTCCCCCCTGGAGGAAGAGGCACCGCAGGCCAA 
GAGGAGGCCAGAGCCCATTATCCCTGTCACTCAGCCCCGGCTGGCAGGCGCTGGTGGGCGCAA 
G AAAATTTTCCGTCTC AG CG ACG TG CTG AAGCCCCTT AC CG ATGC C C AGG TGG AAG C C ATG AA 
GCTGGGCGCTGTGAAGCGGATCCTGCGGGCTGAGAAGGCTGTGGCCTGCAGCGGGGCAGCCCA 
GGTCCGCATAAAGATCCTGGCCAGCCTGGTGACACAGTTCAACTCGGGCCTGAAGGCGGAGGT 
CCTGTCCTTCATCCTGGAGGATGTGCGGGCCCGCCTGGACCTGGCCTTCGCCTGGCTCTACCA 
GGAGTACAACGCCTACCTGGCCGCAGGTGCCTCGGGCTCCCTGGACAAGTATGAGGACTGCCT 
CATCCGCCTGTTGTCTGGCCTGCAGGAGAAACCAGACCAGAAGGATGGGATCTTCACCAAGGT 
TGTGCTGGAGGCGCCACTCATCACAGAGAGTGCCCTGGAGGTGGTCCGCAAGTACTGCGAGGA 
TGAGAGTCGCACCTATCTGGGCATGTCCACACTTCGAGACCTGATCTTCAAGCGCCCGTCCCG 
CCAGTTCCAGTACCTGCATGTCCTCCTCGACCTCAGCTCCCATGAGAAGGACAAGGTGCGCTC 
CCAGGCCCTGCTGTTCATCAAACGCATGTATGAGAAGGAGCAGCTGCGGGAGTATGTGGAGAA 
ATTTGCCCTCAACTACCTGCAGCTCCTGGTGCACCCCAACCCACCGTCTGTGCTGTTTGGAGC 
TGACAAGGACACAGAGGTGGCAGCACCCTGGACGGAGGAGACAGTGAAGCAGTGTCTGTACCT 
CTACCTGGCCCTCCTGCCTCAGAACCACAAGCTGATCCACGAACTGGCGGCCGTGTACACTGA 
AGCCATCGCCGACATCAAGCGGACGGTGCTGAGGGTCATTGAGCAGCCGATCCGAGGAATGGG 
CATGAACTCCCCGGAGCTGCTCCTGCTGGTGGAAAATTGTCCCAAGGGAGCAGAGACACTGGT 
CACGAGATGTCTGCACAGCCTCACAGACAAAGTCCCACCCTCCCCAGAGCTGGTGAAGCGGGT 
CCGGGATCTCTACCACAAGCGACTGCCAGACGTCCGCTTCCTCATCCCGGTGCTCAATGGGCT 
GGAGAAGAAAGAGGTGATCCAGGCCCTGCCTAAACTCATCAAACTCAACCCCATCGTGGTGAA 
GGAAGTCTTCAACCGCCTGCTGGGCACCCAGCATGGTGAGGGAAACTCAGCCTTGTCCCCGCT 
GAACCCTGGAGAGCTCCTGATCGCATTACACAACATTGACTCCGTGAAGTGCGACATGAAATC 
CATCATCAAAGCCACCAACCTGTGCTTTGCGGAGCGGAACGTGTACACGTCAGAGGTGCTGGC 
CGTGGTGATGCAGCAGCTGATGGAGCAGAGCCCCCTGCCCATGCTGCTCATGAGGACCGTCAT 
CCAGTCCCTGACCATGTACCCCCGCCTGGGGGGCTTCGTCATGAACATCCTGTCCCGCCTCAT 
C ATG AAG C AGGTG TG G AAGTACCCCAAGGTG TGGG AGGGCTT C ATCAAGTG CTG C C AG CG C AC 
AAAGCCCCAGAGCTTCCAGGTCATCCTGCAGCTGCCGCCCCAGCAGCTGGGAGCCGTCTTTGA 
CAAGTGCCCAGAGCTCCGGGAGCCCCTGCTGGCCCATGTCCGCTCCTTCACCCCCCACCAGCA 
AG CT C AC ATCCCT AACTC CATC ATG ACC ATCTTG G AGG CC AG CGG C AAG C AGG AG C C AG AG G C 
CAAGGAGGCGCCTGCGGGGCCCTTGGAGGAGGATGATCTGGAGCCCCTGACCTTGGCCCCGGC 
CCC AG C ACCCCGG CC CCC TC AGG ACC TC ATCGGCCTG CG ACTGG CCC AGG AG AAG G CC TT AAA 
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GCGGCAGCTGGAGGAGGAACAGAAGCTGAAGCCGGGAGGAGTGGGAGCCCCCTCCTCTTCCTC 
CCCCTCTCCCTCTCCGTCGGCCCGGCCAGGCCCGCCCCCGTCTGAGGAAGCCATGGATTTCCG 
GGAGGAGGGGCCTGAGTGCGAGACCCCGGGCATCTTCATCAGCATGGATGACGACTCGGGGCT 
GACCGAGGCCGCGCTGTTGGACTCTAGTCTCGAGGGCCCCCTACCCAAGGAGACGGCAGCGGG 
CGGGCTGACCTTGAAGGAGGAGCGGAGCCCCCAGACCCTCGCACCTGTTGGAGAAGATGCTAT 
GAAGACTCCCAGCCCGGCTGCCGAGGACGCCAGGGAACCCGAGGCCAAGGGGAACAGCTGACG 
GGGCTCGAGGGGGAAAGGGGGTGGGACAGGGACTCGGGGCTGGGGGACGGGGCGGGGCTTGAC 


CTGCGGGTGCTTTGCCTTAAAAAGAAATAAA 




ORF Start: ATG at 17 


ORF Stop: TGA at 3839 




SEQIDNO:52 jl274 aa jMW at 141 158.5kD 


NOV 14a, 

CGI 0879 1-01 Protein 
Sequence 


MASGSGDSVTRRSVASQFFTQEEGPGIDGMTTSERWDLLNQAVLITNDSKITVLKQVQELII 
NKDPTLLDNFLDEIIAFQADKSIEVRKFVIGFIEEACKRDIELLLKLIANLNMLLRDENVNVV 
KKAI LTMTQL YKVALQWM VKS RVI S ELQEACWDM VS AMAGDI I LLLDS DNDGI RTHAI KFVEG 
LIVTPSPRMADSEIPRRQEHDISLDRIPRDHPYIQYNVLWEEGKAALEQLLKFMVHPAISSIN 
LTTALGSLANIARQRPMFMSEVIQAYETLHANLPPTLAKSQVSSVRKNLKLHLLSVLKHPASL 
E FQ AQ I TTL L VDLGT PQ AE I ARNM PS S KDT R K R P RDDS DS TL K KM KLE PNLG E DD E D K DL E PG 
PSGTSKASAQISGQSDTDITAEFLQPLLTPDNVANLVLISMVYLPEAMPASFQAI YTPVESAG 
TEAQIKHLARLMATQMTAAGLGPGVEQTKQCKEEPKEEKWKTESVLIKRRLSAQGQA I SVVG 
SLSSMSPLEEEAPQAKRRPEPIIPVTQPRLAGAGGRKKIFRLSDVLKPLTDAQVEAMKLGAVK 
RI LRAEKAVACSGAAQ VR I KI LASL VTQFNSGLKAEVLS F I LED VRARLDLAFAWLYQE YN A Y 
LAAGASGSLDKYEDCLIRLLSGLQEKPDQKDGIFTKWLEAPLITESALEWRKYCEDESRTY 
LGMSTLRDLIFKRPSRQFQYLHVLLDLSSHEKDKVRSQALLFIKRMYEKEQLREYVEKFALNY 
LQLLVHPNPPSVLFGADKDTEVAAPWTEETVKQCLYLYLALLPQNHKLIHELAAVYTEAIADI 
KRTVLRVIEQPIRGMGMNSPELLLLVENCPKGAETLVTRCLHSLTDKVPPSPELVKRVRDLYH 
KRLPDVRFLIPVLNGLEKKEVIQALPKLIKLNPI WKEVFNRLLGTQHGEGNSALSPLNPGEL 
LI ALHNIDS VKCDMKS 1 1 KATNLCFAERNVYTSE VLA WMQQLMEQS PLPMLLMRTVIQSLTM 
YPRLGG F VMN I LS RL I M KQVW K Y PK VWEG FIKCCQRTKPQS FQ V I LQ L P P QQLG A V FDK C PEL 
RE PLLAHVRS FTPHQQAHI PNS I MTI LEASGKQE PEAKEAPAGPLEEDDLEPLTLAPAPAPRP 
PQDLIGLRLAQEKALKRQLEEEQKLKPGGVGAPSSSSPSPSPSARPGPPPSEEAMDFREEGPE 
CETPGIFISMDDDSGLTEAALLDSSLEGPLPKETAAGGLTLKEERSPQTLAPVGEDAMKTPSP 
AAEDAREPEAKGNS 



Further analysis of the NOV 1 4a protein yielded the following properties shown in 
Table 14B. 



Table 14B. Protein Sequence Properties NOV14a 


PSoil 
analysis: 


0.8528 probability located in nucleus; 0.5806 probability located in mitochondrial 
matrix space; 0.3000 probability located in microbody (peroxisome); 0.2922 
probability located in mitochondrial inner membrane 


SignalP 
i analysis: 


No Known Signal Sequence Predicted 



5 



A search of the NOV 1 4a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table I4C. 
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Table 14C. Geneseq Results for NOV14a 



1 

Identifier 

! 


Prnfpin/OrCTini^m/T oncrfh IP-itpnt H 

L I UICIII/ \Jl g«* 11 13111/ LjLIJ^UI [ 1 U ICIIl rr, 

Date] 


NOVHa 

Match 
Residues 


Identities/ 
oi m lid riiies ior 
the Matched 
Region 


Expect 
Value 


ABB5886I 

i 

i 


Drosophila melanogaster polypeptide 

ID NO T*7S - Drn^nnhila 
melanogaster, 1 i 16 aa. 
[ WO200 1 7 1 042-A2, 27-SEP-200 1 ] 


7.. 1103 
7 1 109 


411/1118(36%) 

Ojh/ 1116 ^jj /oj 


0.0 


IAAG38593 

i 


Arabidopsis thaliana protein fragment 

thaliana, 1250 aa. [EP1033405-A2, 
06-SEP-2000] 


553..1 1 13 

0 1 1 ZZo 


1 78/625 (28%) 


2e-66 


AAG38592 


Arabidon^i^ thaliana nrotein fragment 
SEQ ID NO: 47633 - Arabidopsis 
thaliana, 1291 aa. [EP1033405-A2. 
06-SEP-2000] 


553.1 1 13 
653.. 1269 


17R/62S (7R%\ 
310/625 (49%) 


7p-66 


; AAG3859I 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 47632 - Arabidopsis 
thaliana, 1371 aa. [EP1033405-A2, 
06-SEP-2000] 


553..II13 
733.. 1349 


178/625 (28%) 
310/625 (49%) 


2e-66 


AAB86463 


Murine HCN2 protein - Mus sp, 863 
aa. [WO200159153-A2, 16-AUG- 
2001] 


1012..1208 
651. .850 


48/204 (23%) 
75/204 (36%) 


0.005 



In a BLAST search of public sequence datbases, the NOVHa protein was found to 
have homology to the proteins shown in the BLASTP data in Table 14D. 



Table 14D. Public BLASTP Results for NOVHa 



Protein 
Accession 
j Number 


Protein/Organism/Length 


NOVMa 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


! Q92797 

I 


Symplekin - Homo sapiens 
(Human), 1 142 aa. 


133.-1274 
I..1I42 


1141/1 142 (99%) 
1141/1142 (99%) 


0.0 


;AAH302I4 

i 


Hypothetical 58.9 kDa protein - 
Homo sapiens (Human), 533 aa. 


1..533 
I..533 


531/533 (99%) 
531/533 (99%) 


0.0 


! AAM4996I 


LD45768p- Drosophila 
melanogaster (Fruit fly), 1 165 aa. 


7.. 1134 
2..I 151 


422/1166 (36%) 
649/1166(55%) 


0.0 


Q9VNH4 


CG2097 protein - Drosophila 
melanogaster (Fruit fly), 1 1 16 aa. 


7.. 1103 
2.. 1102 


41 1/1118(36%) 
634/1118(55%) 


0.0 


Q9D990 


46324 15HI6Rik protein -Mus 
musculus (Mouse), 304 aa. 


985..I274 
1..304 


246/307 (80%) 
263/307 (85%) 


e-136 
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PFam analysis predicts that the NOV 1 4a protein contains the domains shown in the 
Table I4E. 



j Tabic 14E. Domain Analysis of NOV14a 



1 

1 

| Pfam Domain 

t 


NOV14a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


! 



Example 15. 

The NOV 15 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 1 5 A. 



Table ISA. NOV15 Sequence Analysis 










SEQIDNO: 53 


]959bp 


1 


NOV15a, 

CGI 09247-01 DNA 
jSequence 

i 
{ 


CGGCACGAGCATGGCTACCTCAGAGCTGAGCTGCGAGGTGTCGGAGGAGAACTGTGAGCGCCG 


GGAGGCCTTCTGGGCAGAATGGAAGGATCTGACACTGTCCACACGGCCCGAGGAGGGCTGCTC 
CCTGCATGAGGAGGACACCCAGAGACATGAGACCTACCACCAGCAGGGGCAGTGCCAGGTGCT 
GGTGCAGCGCTCGCCCTGGCTGATGATGCGGATGGGCATCCTCGGCCGTGGGCTGCAGGAGTA 
CCAGCTGCCCTACCAGCGGGTACTGCCGCTGCCCATCTTCACCCCTGCCAAGATGGGCGCCAC 
CAAGGAGGAGCGTGAGGACACCCCCATCCAGCTTCAGGAGCTGCTGGCGCTGGAGACAGCCCT 
GGGTGGCCAGTGTGTGGACCGCCAGGAGGTGGCTGAGATCACAAAGCAGCTGCCCCCTGTGGT 
GCCTGTCAGCAAGCCCGGTGCACTTCGTCGCTCCCTGTCCCGCTCCATGTCCCAGGAAGCACA 
GAGAGGCTGAGAGGGACTGTGACTTGGGCTCCGCTGTGCCCGCCCTGGGCTGGGCCCTTCCTG 


i 
1 


GCTAGGACTGTGGAGGGGAGCTGCTGGCCATGGCTGCTTTGTAGTTTGCCCAGAGTTGGGGGC 


| 


TAGGGGAGGGGGGAGCCAGAGGCCAGGATGCCTGAGCCCCCTGAGTTCCCAAAGGGAGGGTGG 


1 


CAGAGACAGTGGGCACTAAGGGTGGAGAGTTGGGGGCCAGCACAGCTGAGGACCCTCAGCCCC 


1 


AGGAGAAGGGACAAAAGGTACTGGTGAGGGCAAGAGGTGCCTGGGAGGAGTGGCCCTGATCCA 


i 


GG AAAATGTG AGGGGAATCTGG AACG CTC T AGGC AG AAG AAG CTGG G AGGG AG GGGGAGGTGA 


1 


AAAGGGCAGAGGCAAGGATGGTGGGGCCCCCAGCACCCTCTGTTAGTGCCGCAATAAATGCTC 


t 


AATCATGTGCCAGA 


i 


ORF Start: ATG at 11 j 


ORF Stop: TGA at 512 




SEQIDNO: 54 jl67aa 


MWat 19051.5kD 


NOVI5a ? 

CGI 09247-01 Protein 
Sequence 


MATSELSCEVSEENCERREAFWAEWKDLTLSTRPEEGCSLHEEDTQRHETYHQQGQCQVLVQR 
SPWLMMRMGILGRGLQEYQLPYQRVLPLPIFTPAKMGATKEEREDTPIQLQELLALETALGGQ 
CVDRQEVAEITKQLPPWPVSKPGALRRSLSRSMSQEAQRG 




SEQIDNO: 55 


672 bp 




NOV 15b, 

CGI 09247-02 DNA 
iSequence 


CGGCACGAGCATGGCTACCTCAGAGCTGAGCTGCGAGGTGTCGGAGGAGAACTGTGAGCGCCG 


GGAGGCCTTCTGGGCAGAATGGAAGGATCTGACACTGTCCACACGGCCCGAGGAGGGCTGCTC 
CCTGCATGAGGAGGACACCCAGAGACATGAGACCTACCACCAGCAGGGGCAGTGCCAGGTGCT 
GGTGC AG CGCTCG CC CTGG CTG ATG ATGCGG ATG GG C ATCCT CGG C CGTG GGC TG C AG GAG T A 
CCAGCTGCCCTGGGCTGGGCCCTTCCTGGCTAGGACTGTGGAGGGGAGCTGCTGGCCATGGCT 
GCTTTGTAGTTTGCCCAGAGTTGGGGGCTAGGGGAGGGGGGAGCCAGAGGCCAGGATGCCTGA 


i 


GCCCCCTGAGTTCCCAAAGGGAGGGTGGCAGAGACAGTGGGCACTAAGGGTGGAGAGTTGGGG 




GCCAGCACAGCTGAGGACCCTCAGCCCCAGGAGAAGGGACAAAAGGTACTGGTGAGGGCAAGA 


i 


GGTGCCTGGGAGGAGTGGCCCTGATCCAGGAAAATGTGAGGGGAATCTGGAACGCTCTAGGCA 




GAAGAAGCTGGGAGGGAGGGGGAGGTGAAAAGGGCAGAGGCAAGGATGGTGGGGCCCCCAGCA 




CCCTCTGTTAGTGCCGCAATAAATGCTCAATCATGTGCCAGA 




ORF Start: ATG at 1 1 




ORF Stop: TAG at 344 
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SEQ ID NO: 56 1 1 1 aa |MWat l2856.4kD 


;NOV!5b, 

;CG 109247-02 Protein 
jSequence 


MATSELSCEVSEENCERREAFWAEWKDLTLSTRPEEGCSLHEEDTQRHETYHQQGQCQVLVQR 
SPWLMMRMGILGRGLQEYQLPWAGPFLARTVEGSCWPWLLCSLPRVGG 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table I5B. 



f ' — — - ' 1 1 

Table 15B. Comparison of NOV15a against NOV15b. 


Protein Sequence 


NOV15a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV15b 


1..85 
1..85 * 


84/85 (98%) 
85/85 (99%) 



5 

Further analysis of the NOV 15a protein yielded the following properties shown in 
Table ISC. 



Table 15C. Protein Sequence Properties NOV15a 


PSort 
analysis: 


0.6500 probability located in cytoplasm; 0.1000 probability located in mitochondrial 
matrix space; 0.1000 probability located in lysosome (lumen); 0.0000 probability 
located in endoplasmic reticulum (membrane) 


Signal P 
analysis: 


No Known Signal Sequence Predicted 



1 0 A search of the NOV 1 5a protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 15D. 



Table 15D, Geneseq Results for NOV15a 


. Geneseq 
Identifier 

i 

i 


Protein/Organism/Length [Patents, 
Date) 


NOV15a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


.AAM407I6 


Human polypeptide SEQ ID NO 5647 

- Homo sapiens, 170 aa. 

[ WO200 1 533 1 2-A 1 , 26-JUL-200 1 ] 


1 ..167 
4.. 170 


167/167 (100%) 
167/167 (100%) 


le-94 


IAAM38930 


Human polypeptide SEQ ID NO 2075 

- Homo sapiens, 1 13 aa. 

[ WO200 1 533 1 2-A 1 , 26-JUL-200 1 ] 


I..I06 
I..I06 


106/106(100%) 
106/106(100%) 


2e-59 
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AAW71560 


Human hepatocyte nuclear factor 1 
alpha(R131Q mutant) - Homo 
sapiens, 630 aa. [ W098 1 1 254-A 1 , 1 9- 
MAR-I998] 


43..I21 
90.. 164 


20/79 (25%) 
39/79 (49%) 


1.6 


AAW7I562 


Human hepatocyte nuclear factor I 
alpha (truncated mutant) - Homo 
sapiens, 4 1 5 aa. [ W098 1 1 254-A 1,19- 
MAR-1998] 


43..I2I j 19/79(24%) 
90.. 164 j 39/79 (49%) 

i 

i 

i 


4.7 


AAW71561 


Human hepatocyte nuclear factor 1 
alpha (truncated mutant) - Homo 
sapiens, 3 1 4 aa. [ W098 1 1 254-A 1,19- 
MAR-1998] 


43.. 121 j 19/79(24%) 
90.. 164 : 39/79 (49%) 

1 


4.7 



In a BLAST search of public sequence datbases, the NOV 1 5a protein was found to 
have homology to the proteins shown in the BLASTP data in Table I5E. 



Table 15E. Public BLASTP Results for NOV15a 



i 

i 

j Protein 
; Accession 
Number 

i 


Protein/Organism/Length 


NOVlSa 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


1 015273 

j 


Telethonin (Titin cap protein) - 
Homo sapiens (Human), 167 aa. 


I..167 
1-167 


167/167(100%) 
167/167(100%) 


4e-94 


s Q96L27 

j 


Titin-cap (telethonin) - Homo 
sapiens (Human), 167 aa. 


1..167 
I..I67 


166/167 (99%) 
166/167 (99%) 


3e-93 


! 070548 

L - 


Telethonin (Titin cap protein) - 
Mus musculus (Mouse), 167 aa. 


1..167 
1-167 


151/167(90%) 
163/167(97%) 


4e-86 


j O70549 

i 


Telethonin - Mus musculus 
(Mouse), 1 67 aa. 


I..I67 
I..167 


150/167(89%) 
162/167(96%) 


le-85 


ITI8863 

1 


hypothetical protein C02D4.2 - 
Caenorhabditis elegans, 501 aa. 


29-62 
308..341 


13/34 (38%) 
21/34 (61%) 


3.4 



PFam analysis predicts that the NOV 1 5a protein contains the domains shown in the 
Table I5F. 



! Table 15F. Domain Analysis of NOVlSa 



Pfam Domain 



NOVlSa Match Region 



Identities/ 
Similarities 

for the Matched Region 



Expect Value 
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Example 16. 

The NOV 16 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table I6A. 



Table 16A. NOV16 Sequence Analysis 




SEQIDNO:57 


2067 bp j 


NOV 16a, 

CGI 10410-01 DNA 
Sequence 


CGAGAGGAGAGCGCGAGAGCCCCAGCCGCGGGCGGGCGGGCGGTGAAGATGGCAGAGGCACCG 


GCTTCCCCGGCCCCGCTCTCTCCGCTCGAAGTGGAGCTGGACCCGGAGTTCGAGCCCCAGAGC 
CGTCCGCGATCCTGTACGTGGCCCCTGCAAAGGCCGGAGCTCCAAGCGAGCCCTGCCAAGCCC 
TCGGGGGAGACGGCCGCCGACTCCATGATCCCCGAGGAGGAGGACGATGAAGACGACGAGGAC 
GGCGGGGGACGGGCCGGGAACGCCTGGGGAAACCTGTCCTACGCGGACCTGATCACCCGCGCC 
ATCG AG AG CTCC C CGG AC AAAPGG PTPAPTPTCJT P PP AG AT P T A CC AP TG G A Tan tc c r t t r* r* 
GTGCCCTACTTCAAGGATAAGGGCGAPAGPAAPAPPTPTGPPPPPTPPAAPAArTr'PATPrrp 
C ACAACCTGT CACTG C ATAGTCG ATTC ATG CGGGTCC AG AATP AGGG A AP TPP P A A fi A P PT rT 
TGGTGG AT C ATC AAC CCTG ATGGGGGG AAG AGCG G AAAAGC C CC C CGGCGGCG GG CTGTCTC C 
ATGG AC AAT AGCAAC AAGT AT ACCAAG AG CCGTG G C CGCGC AG CC AAG AAG AAGG C A(j CC C TG 
CAGACAGCCCCCGAATCAGCTGACGACAGTCCCTCCCAGCTCTCCAAGTGGCCTGGCAGCCCC 
ACGTCACGCAGCAGTGATGAGCTGGATGCGTGGACGGACTTCCGTTCACGCACCAATTCTAAC 
GCCAGCACAGTCAGTGGCCGCCTGTCGCCCATCATGGCAAGCACAGAGTTGGATGAAGTCCAG 
GACGATGATGCGCCTCTCTCGCCCATGCTCTACAGCAGCTCAGCCAGCCTGTCACCTTCAGTA 
AGCAAGCCGTGCACGGTGGAACTGCCACGGCTGACTGATATGGCAGGCACCATGAATCTGAAT 
G ATGGG CTG ACTG AAAAC CTC ATGG ACG ACCTGC TGG AT AAC AT C ACG CTCCCG C C ATC CC AG 
CCATCGCCCACTGGGGGACTCATGCAGCGGAGCTCTAGCTTCCCGTATACCACCAAGGGCTCG 
GGCCTGGGCTCCCCAACCAGCTCCTTTAACAGCACGGTGTTCGGACCTTCATCTCTGAACTCC 
CTACGCCAGTCTCCCATGCAGACCATCCAAGAGAACAAGCCAGCTACCTTCTCTTCCATGTCA 
CACTATGGTAACCAGACACTCCAGGACCTGCTCACTTCGGACTCACTTAGCCACAGCGATGTC 
ATGATGACACAGTCGGACCCCTTGATGTCTCAGGCCAGCACCGCTGTGTCTGCCCAGAATTCC 
CGCCGGAACGTGATGCTTCGCAATGATCCGATGATGTCCTTTGCTGCCCAGCCTAACCAGGGA 
AGTTTGGTCAATCAGAACTTGCTCCACCACCAGCACCAAACCCAGGGCGCTCTTGGTGGCAGC 
CGTGCCTTGTCGAATTCTGTCAGCAACATGGGCTTGAGTGAGTCCAGCAGCCTTGGGTCAGCC 
AAACACCAGCAGCAGTCTCCTGTCAGCCAGTCTATGCAAACCCTCTCGGACTCTCTCTCAGGC 
TCCTCCTTGTACTCAACTAGTGCAAACCTGCCCGTCATGGGCCATGAGAAGTTCCCCAGCGAC 
TTGGACCTGGACATGTTCAATGGGAGCTTGGAATGTGACATGGAGTCCATTATCCGTAGTGAA 
CTCATGGATGCTGATGGGTTGGATTTTAACTTTGATTCCCTCATCTCCACACAGAATGTTGTT 
GGTTTGAACGTGGGGAACTTCACTGGTGCTAAGCAGGCCTCATCTCAGAGCTGGGTGCCAGGC 
TGAAGGATCACTGAGGAAGGGGAAGTGGGCAAAGCAGACCCTCAAACTGACACAAGACCTACA 


GAGAAAACCCTTTGCCAAATCTGCTCTCAGCAAGTGGACAGTGATACCGTTTACAGCTTAACA 


CCTTTGTGAATCCCACGCCATTTTCCTAACCCAGCAGAGACTGTTAATGGCCCCTTACCCTGG 


GTGAAGCACTTACCCTTGGAACAGAACTCTAAAAAGTATGCAAAATCTTCC 




ORF Start: ATG at 49 j 


ORF Stop: TGA at 1828 




SEQIDNO:58 593 aa 


MW at 63891. 6kD 


NOV16a, 

CGI 10410-01 Protein 
Sequence 


MAEAPASPAPLSPLEVELDPEFEPQSRPRSCTWPLQRPELQASPAKPSGETAADSMIPEEEDD 
EDDEDGGGRAGNAWGNLSYADLITRAIESSPDKRLTLSQIYEWMVRCVPYFKDKGDSNSSAGW 
KNSIRHNLSLHSRFMRVQNEGTGKSSWWIINPDGGKSGKAPRRRAVSMDNSNKYTKSRGRAAK 
KKAALQTAPESADDSPSQLSKWPGSPTSRSSDELDAWTDFRSRTNSNASTVSGRLSPIMASTE 
LDEVQDDDAPLSPMLYSSSASLS PSVSKPCTVELPRLTDMAGTMNLNDGLTENLMDDLLDNIT 
LPPSQPSPTGGLMQRSSSFPYTTKGSGLGSPTSSFNSTVFGPSSLNSLRQSPMQTIQENKPAT 
FSSMSHYGNQTLODLLTSDSLSHSDVMMTQSDPLMSQASTAVSAQNSRRNVMLRNDPMMSFAA 
QPNQGSLVNQNLLHHQHQTQGALGGSRALSNSVSNMGLSESSSLGSAKHQQQSPVSQSMQTLS 
DSLSGSSLYSTSAJ^LPVMGHEKFPSDLDLDMFNGSLECDMESIIRSELMDADGLDFNFDSLIS 
TQNWGLNVGNFTGAKQASSQSWVPG 



5 

Further analysis of the NOV16a protein yielded the following properties shown in 
Table I6B. 
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Table 16B. Protein Sequence Properties NOV16a 


■ PSort 
analysis: 


0.3000 probability located in nucleus; 0.1000 probability located in mitochondrial 
matrix space; 0.1000 probability located in lysosome (lumen); 0.0000 probability 
located in endoplasmic reticulum (membrane) 


SignalP 
; analysis: 


No Known Signal Sequence Predicted 



A search of the NOV 16a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 16C. 

5 



Table 16C. Geneseq Results for NOV16a 


j Geneseq 
j Identifier 

f 


Protein/Organism/Length [Patent #, 
Date] 


NOV16si 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


l'AAY96449 

i 

i 


Forkhead transcription factor FKHRL1 

- Homo sapiens, 673 aa. 

[ WO20003 1 29 1 -A 1 , 02-JUN-2000] 


75..593 
1 55..673 


519/519(100%) 
519/519(100%) 


0.0 


|AAB56951 

| 


Human prostate cancer antigen protein 
sequence SEQ ID NO: 1529 - Homo 
sapiens, 233 aa. [ WO200055 1 74-A 1 , 
21-SEP-2000] 


367..593 
7..233 


227/227(100%) 
227/227(100%) 


e-126 


! AAY96448 

t 

i 
! 


Forkhead transcription factor FKHR - 
Homo sapiens, 655 aa. 
[WO200031291-AI, 02-JUN-2000] 


75.-593 
I58..655 


256/530 (48%) 
341/530 (64%) 


e-120 


AAB06076 


Human homologue of Caenorhabditis 
elegans DAF-16 - Homo sapiens, 655 
aa. [WO200033068-Al,08-JUN-2000] 


75..S93 
I58..655 


256/530 (48%) 
341/530 (64%) 


e-120 


ABG20865 

i 
i 

1 


Novel human diagnostic protein 
#20856 - Homo sapiens, 837 aa. 
[WO200175067-A2, ll-OCT-2001] 


127..593 
392..837 


208/478 (43%) 
291/478 (60%) 


3e-90 



In a BLAST search of public sequence datbases, the NOV 1 6a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 16D. 



Table 16D. Public BLASTP Results for NOV16a 








j Protein 
! Accession 
Number 


Protein/Organism/Length 


NOV16a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 
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043524 

f 
1 


Forkhead box protein 03A (Forkhead 
in rhabdomyosarcoma-Iike l)(AF6q2i 
protein) - Homo sapiens (Human), 673 
aa. 


75.. 593 
155, .673 


519/519(100%) 
519/519(100%) 


0.0 


: Q9WVH4 

1 

1 
1 


Forkhead protein FKHR2 - Mus 
musculus (Mouse), 672 aa. 


39..593 
106..672 


506/567 (89%) 
523/567 (91%) 


0.0 


! Q9BZ04 

i 
i 

j 

i 

| 


BA653O20.1 (forkhead box03A 
(forkhead Drosophila homolog like 1 , 
FKHRLI)) - Homo sapiens (Human), 
484 aa. 


127..593 
18..484 


466/467 (99%) 
467/467 (99%) 


0.0 


i Q90YK2 

i 


Forkhead protein xFKHRl - 
Xiphophorus maculatus (Southern 
platyfish), 664 aa. 


1..593 
1..664 


367/684 (53%) 
439/684 (63%) 


e-176 


Q9W7F8 

! 


Forkhead protein FKHR - Brachydanio 
rerio (Zebra fish) (Zebra danio), 65 1 aa. 


10..593 
8..65I 


370/662 (55%) 
443/662 (66%) 


e-172 



PFam analysis predicts that the NOV 1 6a protein contains the domains shown in the 
Table I6E. 



r— ' — ' — 1 — — — 1 ' " — 1 — ~ — 1 — 

| Table 16E. Domain Analysis of NOV16a 


i 

] 




Identities/ 




! Pfam Domain 


NOV16a Match Region 


Similarities 


Expect Value 






for the Matched Region 




■ Fork head 

! 


46..159 


39/117 (33%) 
84/117(72%) 


2.9e-20 



5 



Example 17, 

The NOV17 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 17A. 



|Table 17A. NOV17 Sequence Analysis 




SEQIDNO:59 ]494bp 


NOV 17a. 

.CGI 10882-01 DNA 
jSequence 

i 

i 

I 

i 


AACGCGGCACATGGGAAGGAGCTCTACACACAGTGCCAGGCCTGCCACAAGCCCGTTGAAAAC 
TTCGTCGGCCCGAAGCACTGTGGCCTCATCGGCCGTCCAGCAGCCAGTGTGCCGGGATACGAC 
T ATTC AG AAGGC ATG AAGG CG TCGGGACT C ACCT GGG ACG AATCG ACGCTCG AT C AAT T TC TC 
AC TTCGC CCGT AGCCTTC GTC AATGGC ACG AAG ATGGGTTTTGCCGG ATT CG AT AAC C CG AGT 
GACCGGGCCGATGTCATTGCCTGGCTGCGCAAGATGAATGACGATCCCACCATCTGCCCGAAG 
AAGAGCTGACACCCATGCGCACGCAAACGCTCATGCGAGCTGCCTCCCGCGTTGCCTGTGGCG 


CCTTGCTCGTCATGTCCGCCGCGCATGCGGCGGCGCCGACACCGTCTGACCCGCGCTCGATCG 


GCGGCGGCGAATGCGCCAAGAATGCTTATAACTGTGTGGGTGCCGCCAGGACG 




ORF Start: at 1 


ORF Stop:.TGA at 322 


i 


SEQIDNO:60 107 aa 


MWat H607.0kD 
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NOV 17a, 

CGI 10882-01 Protein 
Sequence 



NAAHGKELYTQCQACHKPVENFVGPKHCGLIGRPAASVPGYDYSEGMKASGLTWDESTLDQFL 

tspvafvngtkmgfagfdnpsdradviawlrkmnddpticpkks 



Further analysis of the NOV 17a protein yielded the following properties shown in 
Table I7B. 



Tabic 17B. Protein Sequence Properties NOV17a 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.2852 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV 17a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 17C. 



Table 17C. Geneseq Results for NOV17a 


Geneseq 
Identifier 


Protein/Organism/Length (Patent #, 
Date) 


NOV17a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


ABB71I20 


Drosophila melanogaster polypeptide 
SEQ ID NO 40 1 52 - Drosophila 
melanogaster, 108 aa. [WO2001 71042- 
A2, 27-SEP-200I] 


5..94 
1 1 ..104 


40/94 (42%) 
59/94 (62%) 


2e-l5 


AAG38073 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 46915 - Arabidopsis 
thaliana, 1 12 aa. [EP1033405-A2, 06- 
SEP-2000] 


I. .95 

II. .109 


39/99 (39%) 
58/99 (58%) 


8e-15 


A AG 16602 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 1731 1 - Arabidopsis 
thaliana, 1 12 aa. [EPI033405-A2.06- 
SEP-2000] 


I. 95 

II. .I09 


39/99 (39%) 
58/99 (58%) 


8e-l5 


AAG27047 

t 


Zea mays protein fragment SEQ ID NO: 
3 1733 - Zea mays subsp. mays, 1 19 aa. 
[EP1033405-A2, 06-SEP-2000] 


1..99 
1 1 — 1 


39/103 (37%) 
58/103 (55%) 


2e-l4 


AAY77943 


A. thaliana environmental stress 
tolerance related protein - Arabidopsis 
thaliana, 1 12 aa. [WO200008I87-A2, 
17-FEB-2000] 


I..95 
1 1 ..109 


40/99 (40%) 
57/99 (57%) 


3e-l4 
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In a BLAST search of public sequence datbases, the NOV 17a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 1 7D. 



, Table 17D. Public BLASTP Results for NOV17a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV17a 
Residues/ 
Match 
Residues 

1 W J 1U KM vO 


Identities/ 
Similarities for 
the Matched 

Portion 


Expect 
Value 


PO0082 


Cytochrome C2 - Rhodomicrobium 
vannielii, 104 aa. 


5..97 
7.. 102 


47/96 (48%) 
70/96 (71%) 


6e-22 


Q8YEH5 


Cytochrome C-552 - Brucella 
melitensis, 202 aa. 


3.. 104 
75.. 182 


53/108(49%) 
71/108 (65%) 


2e-21 


Q939T7 

i 


Flavocytochrome'c cytochrome subunit 
- Rhodovulum sulfldophilum 
(Rhodobacter sulfidophilus), 238 aa. 


5.. 105 
34..140 


43/107 (40%) 
67/107 (62%) 


4c-17 


Q53I44 

! 


ISOCYTOCHROME C2 precursor - 
Rhodobacter sphaeroides 
(Rhodopseudomonas sphaeroides), 144 
aa. 


3..I01 
26..130 


43/105 (40%) 
63/105 (59%) 


9e-17 


Q98BN4 


Cytochrome c - Rhizobium loti 
(Mesorhizobium loti), 215 aa. 


1 .104 
7I..181 


49/1 1 1 (44%) 
63/1 1 1 (56%) 


le-16 



5 PFam analysis predicts that the NOV 17a protein contains the domains shown in the 

Table I7E. 



Table 17E. Domain Analysis of NOV17a 










Identities/ 




Pfam Domain 


NOV17a Match Region 


Similarities 


Expect Value 






for the Matched Region 


cytochrome^ 


1..97 


41/116(35%) 
76/116(66%) 


7e-23 



Example 18. 

1 0 The NOV 1 8 clone was analyzed, and the nucleotide and encoded polypeptide 

sequences are shown in Table 18A. 



(Table 18A. NOV18 Sequence Analysis 


I- 


SEQIDNO:61 (4529 bp 




jNOV18a, 

jCOl 11 188-01 DNA 


GCGCCGCGTCTTCCCGGTCTCCTTTCCCGGCCGCACAGGGTTTTATAGGATCACATTGACAAA 


AGTACCATGGAGTTTTATGAGTCAGCATATTTTATTGTTCTTATTCCTTCAAT AGTTATT ATA 
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Sequence 



GTAATTTTCCTCTTCTTCTGGCTTTTCATGAAAGAAACATTATATGATGAAGTTCTTGCAAAA 

CAGAAAAGAGAACAAAAGCTTATTCCTACCAAAACAGATAAAAAGAAAGCAGAAAAGAAAAAG 

AATAAAAAGAAAGAAATCCAGAATGGAAACCTCCATGAATCCGACTCTGAGAGTGTACCTCGA 

GACTTTAAATTATCAGATGCTTTGGCAGTAGAAGATGATCAAGTTGCACCTGTTCCATTGAAT 

GTCGTTGAAACTTCAAGTAGTGTTAGGGAAAGAAAAAAGAAGGAAAAGAAACAAAAGCCTGTG 

CTTGAAGAGCAGGTCATCAAAGAAAGTGACGCATCAAAGATTCCTGGCAAAAAAGTAGAACCT 

GTCCCAGTTACTAAACAGCCCACCCCTCCCTCTGAAGCAGCTGCCTCGAAGAAGAAACCAGGG 

CAGAAGAAGTCTAAAAATGGAAGCGATGACCAGGATAAAAAGGTGGAAACTCTCATGGTACCA 

TCAAAAAGGCAAGAAGCATTGCCCCTCCACCAAGAGACTAAACAAGAAAGTGGATCAGGGAAG 

AAAGCTTCATCAAAGAAACAAAAGACAGAAAATGTCTTCGTAGATGAACCCCTTATTCATGCA 

ACTACTT ATATTCCTTTG ATG G ATAATGCTG ACT C AAGTC CTGTG GT AG AT AAG AG AG AGG TT 

ATTGATTTGCTTAAACCTGACCAAGTAGAAGGGATCCAGAAATCTGGGACTAAAAAACTGAAG 

ACCGAAACTGACAAAGAAAATGCTGAAGTGAAGTTTAAAGATTTTCTTCTGTCCTTGAAGACT 

ATGATGTTTTCTGAAGATGAGGCTCTTTGTGTTGTAGACTTGCTAAAGGAGAAGTCTGGTGTA 

ATACAAGATGCTTTAAAGAAGTCAAGTAAGGGAGAATTGACTACGCTTATACATCAGCTTCAA 

GAAAAGGACAAGTTACTCGCTGCTGTGAAGGAAGATGCTGCTGCTACAAAGGATCGGTGTAAG 

CAGTTAACCCAGGAAATGATGACAGAGAAAGAAAGAAGCAATGTGGTTATGACAAGGATGAAA 

GATCGGATTGGAACATTAGAAAAGGAACATAATGTATTTCAAAACAAAATACATGTCAGTTAT 

CAAGAGACTCAACAGATGCAGATGAAGTTTCAGCAAGTTCGTGAGCAGATGGAGGCAGAGATA 

GCTCACTTGAAGCAGGAAAATGGTATACTGAGAGATGCAGTCAGCAACACTACAAATCAACTG 

GAAAGCAAGCAGTCTGCAGAACTAAATAAACTACGCCAGGATTATGCTAGGTTGGTGAATGAG 

CTGACTGAGAAAACAGGAAAGCTACAGCAAGAGGAAGTCCAAAAGAAGAATGCTGAGCAAGCA 

GCTACTCAGTTGAAGGTTCAACTACAAGAAGCTGAGAGAAGGTGGGAAGAAGTTCAGAGCTAC 

ATCAGGAAGAGAACAGCGGAACATGAGGCAGCACAGCAAGATTTACAGAGTAAATTTGTGGCC 

AAAGAAAATGAAGTACAGAGTCTGCATAGTAAGCTTACAGATACCTTGGTATCAAAACAACAG 

TT GG AGC AAAG ACTAATGC AG TT AATGG AAT C AGAG C AG AAAAGG GTG AAC AAAG AAG AG T CT 

CTACAAATGCAGGTTCAGGATATTTTGGAGCAGAATGAGGCTTTGAAAGCTCAAATTCAGCAG 

TTCCATTCCCAGATAGCAGCCCAGACCTCCGCTTCAGTTCTAGCAGAAGAATTACATAAAGTG 

ATTGCAGAAAAGGATAAGCAGATAAAACAGACTGAAGATTCTTTAGCAAGTGAACGTGATCGT 

TTAACAAGTAAAGAAGAGGAACTTAAGGATATACAGAATATGAATTTCTTATTAAAAGCTGAA 

GTGCAGAAATTACAGGCCCTGGCAAATGAGCAGGCTGCTGCTGCACATGAATTGGAGAAGATG 

CAACAAAGTGTTTATGTTAAAGATGATAAAATAAGATTGCTGGAAGAGCAACTACAACATGAA 

ATTTCAAACAAAATGGAAGAATTTAAGATTCTAAATGACCAAAACAAAGCATTAAAATCAGAA 

GTTCAGAAGCTACAGACTCTTGTTTCTGAACAGCCTAATAAGGATGTTGTGGAACAAATGGAA 

AAATG C ATTC AAG AAAAAG ATG AG AAGTT AAAG ACTGTGG AAG AATT ACTTG AAACTG G AC TT 

ATTCAGGTGGCAACTAAAGAAGAGGAGCTGAATGCAATAAGAACAGAAAATTCATCTCTGACA 

AAAGAAGTTCAAGACTTAAAAGCTAAGCAAAATGATCAGGTTTCTTTTGCCTCTCTAGTTGAA 

G AACTT AAG AAAGTG ATC C ATG AG AAAGATGG AAAG AT C AAGT CTGT AG AAG AG C T T CTG G AG 

GCAGAACTTCTCAAAGTTGCTAACAAGGAGAAAACTGTTCAGGATTTGAAACAGGAAATAAAG 

GCTCTAAAAGAAGAAATAGGAAATGTCCAGCTTGAAAAGGCTCAACAGTTATCTATCACTTCC 

AAAGTTCAGG AGCTTCAGAACTTATT AAAAGG AAAAGAGGAAC AG ATGAATACC ATG AAGGCT 

GTTTTGGAAGAGAAAGAGAAAGACCTAGCCAATACAGGGAAGTGGTTACAGGATCTTCAAGAA 

GAAAATGAATCTTTAAAAGCACATGTTCAGGAAGTAGCACAACATAACTTGAAAGAGGCCTCT 

TCTGC ATCAC AGTTTG AAG AACT TG AG AT TGTG T TG AAAG AAAAGGG AAATG AAT TG AAG AGG 

TTAGAAGCCATGCTAAAAGAGAGGGAGAGTGATCTTTCTAGCAAAACACAGCTGTTACAGGAT 

GTACAAGATGAAAACAAATTGTTTAAGTCCCAAATTGAGCAGCTTAAACAACAAAACTACCAA 

CAGGCATCTTCTTTTCCCCCTCATGAAGAATTATTAAAAGTAATTTCAGAAAGAGAGAAAGAA 

ATAAGTGGTCTCTGGAATGAGTTAGATTCTTTGAAGGATGCAGTTGAACACCAGAGGAAGAAA 

AACAATGACCTTCGGGAGAAAAACTGGGAAGCAATGGAAGCATTGGCATCAACTGAAAAAATG 

CTGCAGGACAAAGTGAACAAGACTTCCAAGGAAAGGCAGCAACAGGTGGAAGCTGTTGAGTTG 

GAGGCTAAAGAAGTTCTCAAAAAATTATTTCCAAAGGTGTCTGTCCCTTCTAATTTGAGTTAT 

GGTGAATGGTTGCATGGATTTGAAAAAAAGGCAAAAGAATGTATGGCTGGAACTTCAGGGTCA 

GAGGAGGTTAAGGTTCTAGAGCACAAGTTGAAAGAAGCTGATGAAATGCACACATTGTTACAG 

CTAGAGTGTGAAAAATACAAATCCGTCCTTGCAGAAACAGAAGGAATTTTACAGAAGCTACAG 

AG AAG TGTTG AG C AAG AAG AAAAT AAATG G AAAG TT AAG GTCG AT G AATC AC A C AAG A C T ATT 

AAAC AGATGC AGTC ATC ATTT AC AT CTTCAG AAC AAG AGCTAG AG CG ATT AAG AAGCG AAAAT 

AAGG AT ATTG AAAATCTG AG AAG AG AACG AG AA C AT T TGG AAATG GAACT AG AAAAGG C AG AG 

ATGGAACGATCTACCTATGTTACAGAAGTCAGAGAGTTGAAGGCACAGTTAAATGAAACACTC 

AC AAAACTTAG AACTG AAC AAAATG AAAG AC AG AAGGT AG CT GGTG ATTTG CAT AAG G CT C AA 

CAGTCACTGGAGCTTATCCAGTCAAAAATAGTAAAAGCTGCTGGAGACACTACTGTTATTGAA 

AATAGTGATGTTTCCCCAGAAACGGAGTCTTCTGAGAAGGAGACAATGTCTGTAAGTCTAAAT 

C AG ACTGT AAC AC AGTT AC AG C AGTTGCTT C AGG CGGT AAAC CAAC AG CT CAC AAAG G AG AAA 

GAGCACTACCAGGTGTTAGAGTGAAGTAATTGGGAAACTGTTCATTTGAGGATAAAAAAGGCA 



TTGTATTATATTTTGCCAAATTAAAGCCTTATTTATGTTTTCACCCTTTCTACTTTGTCAGAA 
ACACTGAACAGAGTTTTGTCTTTTCTAATCCTTGTTAGACTACTGATTTAAAGAAGGAAAAAA 
AAAAGCCAACTCTGTAGACACCTTCAGAGTTTAGTTTTATAATAAAAACTGTTTGAATAATTA 
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i 
i 

i 


GACCTTTACATTCCTGAAGATAAACATGTAATCTTTTATCTTATTTTGCTCAATAAAATTGTT 


CAGAAGATCAAAGTGGTAAAGACAATGTAAAATTTAACATTTTAATACTGATGTTGTACACTG 


TTTTACTTAACATTTTGGGAAGTAACTGCCTCTGACTTCAACTCAAGAAAACACTTTTTTGTT 


GCTAATGTAATCGGTTTTTGTAATGGCGTCAGCAAATAAAAGGATGCTTATTATTC 


i * 


ORF Start: ATG at 70 j |ORF Stop: TGA at 4054 




SEQIDNO:62 1328 aa ]MW at 1 52804.3kD 


NOV 18a, 

CGI 11 188-01 Protein 
Sequence 

i 


MEFYESAYFIVLIPSIVITVIFLFFWLFMKETLYDEVLAKQKREQKLIPTKTDKKKAEKKKNK 
KKEIQNGNLHESDSESVPRDFKLSDALAVEDDQVAPVPLNWETSSSVRERKKKEKKQKPVLE 
EQVIKESDASKIPGKKVEPVPVTKQPTPPSEAAASKKKPGQKKSKNGSDDQDKKVETLMVPSK 
RQEALPLHQETKQESGSGKKASSKKQKTENVFVDEPLIHATTYIPLMDNADSSPWDKREVID 
LLKPDQVEGIQKSGTKKLKTETDKENAEVKFKDFLLSLKTMMFSEDEALCWDLLKEKSGVIQ 
DALKKSSKGELTTLIHQLQEKDKLLAAVKEDAAATKDRCKQLTQEMMTEKERSNVVMTRMKDR 
IGTLEKEHNVFQNKIHVSYQETQQMQMKFQQVREQMEAEIAHLKQENGILRDAVSNTTNQLES 
KQS AELNKLRQD Y ARL VNELTEKTGKLQQEE VQK KNAEQAATQL KVQLQEAERRWEE VQS Y I R 
KRTAEHEAAQQDLQSKFVAKENEVQSLHSKLTDTLVSKQQLEQRLMQLMESEQKRVNKEESLQ 
MQVQDILEQNEALKAQIQQFHSQIAAQTSASVLAEELHKVIAEKDKQIKQTEDSLASERDRLT 
SKEEELKDTONMNFLLKAEVOKLOALANEOAAAAHELEKMOOSVYVKDDKTRT.T.FFn'LOHFT^ 
NKMEEFKILNDQNKALKSEVQKLQTLVSEQPNKDWEQMEKCIQEKDEKLKTVEELLETGLIQ 
VATKE EE LNA I RT ENS SLTKE VQDL KAKQNDQ VS FAS L VE EL K K V I HEKDGK IKSVEELLEAE 
LLKVANKEKTVQDLKQEIKALKEEIGNVQLEKAQQLSITSKVQELQNLLKGKEEQMOTMKAVL 
EEKEKDLANTGKWLQDLQEENESLKAHVQEVAQHNLKEASSASQFEELEIVLKEKGNELKRLE 
AMLKERESDLSSKTQLLQDVQDENKLFKSQI EQLKQQNYQQASS FPPHEELLKVI SEREKEI S 
GLWNELDSLKDAVEHQRKK1WDLREKNWEAMEALASTEKMLQDKVNKTSKERQQQVEAVELEA 
KEVLKKLFPKVSVPSNLSYGEWLHGFEKKAKECMAGTSGSEEVKVLEHKLKEADEMHTLLQLE 
CEKYKSVLAETEGILQKLQRSVEQEENKWKVKVDESHKTIKQMQSSFTSSEQELERLRSENKD 
IENLRREREHLEMELEKAEMERSTYVTEVRELKAQLNETLTKLRTEQNERQKVAGDLHKAQQS 
LELIQSKIVKAAGDTTVIENSDVSPETESSEKETMSVSLNQTVTQLQQLLQAVNQQLTKEKEH 
YQVLE 


i 
i 


SEQ1DN0:63 4326 bp 


NOV! 8b, 

CGI 11 188-02 DNA 
Sequence 

i 

| 

i 

i 

i 


GCCGCACAGGGTTTTATAGGATCACATTGACAAAAGTACCATGGAGTTTTATGAGTCAGCATA 


TTTTATTGTTCTTATTCCTTCAATAGTTATTACAGTAATTTTCCTCTTCTTCTGGCTTTTCAT 
G AAAG AAAC AT TATATGATGAAGTTCTTGC AAAAC AG AAAAG AG AAC AAAAGC T T ATT C CT AC 
CAAAACAGATAAAAAGAAAGCAGAAAAGAAAAAGAATAAAAAGAAAGAAATCCAGAATGGAAA 
CCTCCATGAATCCGACTCTGAGAGTGTACCTCGAGACTTTAAATTATCAGATGCTTTGGCAGT 
AGAAGATGATCAAGTTGCACCTGTTCCATTGAATGTCGTTGAAACTTCAAGTAGTGTTAGGGA 
AAG AAAAAAG AAGG AAAAG AAAC AAAAGC CTGTGCTTGAAG AG CAGGT CATC AAAG AAAGTG A 
CGCATCAAAGATTCCTGGCAAAAAAGTAGAACCTGTCCCAGTTACTAAACAGCCCACCCCTCC 
CTCTGAAGCAGCTGCCTCGAAGAAGAAACCAGGGCAGAAGAAGTCTAAAAATGGAAGCGATGA 
CC AGG AT AAAAAGGTGGAAAC TC TC ATGGT ACC ATC AAAAAGG C AAG AAG CAT TGCCCCTC C A 
CCAAGAGACTAAACAAGAAAGTGGATCAGGGAAGAAGAAAGCTTCATCAAAGAAACAAAAGAC 
AGAAAATGTCTTCGTAGATGAACCCCTTATTCATGCAACTACTTAT ATTCCTTTGATGGATAA 
TGCTGACTCAAGTCCTGTGGTAGATAAGAGAGAGGTTATTGATTTGCTTAAACCTGACCAAGT 
AGAAGGGATCCAGAAATCTGGGACTAAAAAACTGAAGACCGAAACTGACAAAGAAAATGCTGA 
AGTGAAGTTTAAAGATTTTCTTCTGTCCTTGAAGACTATGATGTTTTCTGAAGATGAGGCTCT 
TTGTGTTGT AG AC TTGCT AAAGG AG AAGTCT GGTGT AAT AC AAGATG CTTT AAAG AAG TC AAG 
TAAGGGAGAATTGACTACGCTTATACATCAGCTTCAAGAAAAGGACAAGTTACTCGCTGCTGT 
GAAGGAAGATGCTGCTGCTACAAAGGATCGGTGTAAGCAGTTAACCCAGGAAATGATGACAGA 
GAAAGAAAGAAGCAATGTGGTTATAACAAGGATGAAAGATCGAATTGGAACATTAGAAAAGGA 
ACATAATGTATTTCAAAACAAAATACATGTCAGTTATCAAGAGACTCAACAGATGCAGATGAA 
GTT TC AG C AAGTT CG TG AGC AGATGG AGG C AG AG AT AGCT C ACTTG AAGC AGG AAAATGG T AT 
ACTGAGAGATGCAGTCAGCAACACTACAAATCAACTGGAAAGCAAGCAGTCTGCAGAACTAAA 
TAAACTACGCCAGGATTATGCTAGGTTGGTGAATGAGCTGACTGAGAAAACAGGAAAGCTACA 
GCAAGAGGAAGTCCAAAAGAAGAATGCTGAGCAAGCAGCTACTCAGTTGAAGGTTCAACTACA 
AGAAGCTGAGAGAAGGTGGGAAGAAGTTCAGAGCTACATCAGGAAGAGAACAGCGGAACATGA 
GGCAGCACAGCAAGATTTACAGAGTAAATTTGTGGCCAAAGAAAATGAAGTACAGAGTCTGCA 
TAGTAAGCTTACAGATACCTTGGTATCAAAACAACAGTTGGAGCAAAGACTAATGCAGTTAAT 
GG AAT C AG AGC AG AAAAGGGTG AAC AAAG AAG AGTCT C T AC AAATG C AGGTT C AG GAT AT TT T 
GGAGCAGAATGAGGCTTTGAAAGCTCAAATTCAGCAGTTCCATTCCCAGATAGCAGCCCAGAC 
CTCCG CTTC AGTTCTAGC AGAAG AAT TAC AT AAAG TG ATTG C AG AAAAGG AT AAG C AG AT AAA 
ACAGACTGAAGATTCTTTAGCAAGTGAACGTGATCGTTTAACAAGTAAAGAAGAGGAACTTAA 
GGATATACAGAATATGAATTTCTTATTAAAAGCTGAAGTGCAGAAATTACAGGCCCTGGCAAA 
TGAGCAGGCTGCTGCTGCACATGAATTGGAGAAGATGCAACAAAGTGTTTATGTTAAAGATGA 
TAAAATAAGATTGCTGGAAGAGCAACTACAACATGAAATTTCAAACAAAATGGAAGAATTTAA 
GATTCTAAATGACCAAAACAAAGCATTAAAATCAGAAGTTCAGAAGCTACAGACTCTTGTTTC 
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TGAACAGCCTAATAAGGATGTTGTGGAACAAATGGAAAAATGCATTCAAGAAAAAGATGAGAA 
GTTAAAGACTGTGGAAGAATTACTTGAAACTGGACTTATTCAGGTGGCAACTAAAGAAGAGGA 
GCTGAATGCAATAAG AAC AG AAAATTC ATCTCTG ACAAAAGAAGTT CAAGACTT AAAAG C T AA 
GCAAAATGATCAGGTTTCTTTTGCCTCTCTAGTTGAAGAACTTAAGAAAGTGATCCATGAGAA 
AG ATGG AAAG ATC AAGTCTGT AG AAG AGC TT CTGG AGG C AG AAC TTCT C AAAG TTG CT AA C AA 
GG AG AAAACTGTTCAGGATTTGAAACAGGAAATAAAGGCTCT AAAAG AAGAAATAGGAAATGT 
CCAGCTTGAAAAGGCTCAACAGTTATCTATCACTTCCAAAGTTCAGGAGCTTCAGAACTTATT 
AAAAGGAAAAGAGGAACAGATGAATACCATGAAGGCTGTTTTGGAAGAGAAAGAGAAAGACCT 
AGCCAATACAGGGAAGTGGTTACAGGATCTTCAAGAAGAAAATGAATCTTTAAAAGCACATGT 
TCAGGAAGTAGCACAACATAACTTGAAAGAGGCCTCTTCTGCATCACAGTTTGAAGAACTTGA 
GATTGTGTTGAAAGAAAAGGAAAATGAATTGAAGAGGTTAGAAGCCATGCTAAAAGAGAGGGA 
GAGTGATCTTTCTAGCAAAACACAGCTGTTACAGGATGTACAAGATGAAAACAAATTGTTTAA 
GTCCCAAATTGAGCAGCTTAAACAACAAAACTACCAACAGGCATCTTCTTTTCCCCCTCATGA 
AGAATTATTAAAAGTAATTTCAGAAAGAGAGAAAGAAATAAGTGGTCTCTGGAATGAGTTAGA 
TTCTTTGAAGGATGCAGTTGAACACCAGAGGAAGAAAAACAATAGTTATGGTGAATGGTTGCA 
TG G ATTTGAAAAAAAGG C AAAAG AATGTATGGCT GGAACTTCAG GGT C AG AGG AGGT T AAG G T 
TCTAGAGCACAAGTTGAAAGAAGCTGATGAAATGCACACATTGTTACAGCTAGAGTGTGAAAA 
ATACAAATCCGTCCTTGCAGAAACAGAAGGAATTTTACAGAAGCTACAGAGAAGTGTTGAGCA 
AGAAGAAAATAAATGGAAAGTTAAGGTCGATGAATCACACAAGACTATTAAACAGATGCAGTC 
ATCATTTACATCTTCAGAACAAGAGCTAGAGCGATTAAGAAGCGAAAATAAGGATATTGAAAA 
TCTGAGAAGAGAACGAGAACATTTGGAAATGGAACTAGAAAAGGCAGAGATGGAACGATCTAC 
CTATGTTACAGAAGTCAGAGAGTTGAAGGCACAGTTAAATGAAACACTCACAAAACTTAGAAC 
TGAACAAAATGAAAGACAGAAGGTAGCTGGTGATTTGCATAAGGCTCAACAGTCACTGGAGCT 
TATCCAGTCAAAAATAGTAAAAGCTGCTGGAGACACTACTGTTATTGAAAATAGTGATGTTTC 
CCCAGAAACGGAGTCTTCTGAGAAGGAGACAATGTCTGTAAGTCTAAATCAGACTGTAACACA 
GTTACAGCAGTTGCTTCAGGCGGTAAACCAACAGCTCACAAAGGAGAAAGAGCACTACCAGGT 
GTTAGAGTGAAGTAATTGGGAAACTGTTCATTTGAGGATAAAAAAGGCATTGTATTATATTTT 


GCCAAATTAAAGCCTTATTTATGTTTTCACCCTTTCTACTTTGTCAGAAACACTGAACAGAGT 


TTTGTCTTTTCTAATCCTTGTTAGACTACTGATTTAAAGAAGGAAAAAAAAAAGCCAACTCTG 


TAGACACCTTCAGAGTTTAGTTTTATAATAAAAACTGTTTGAATAATTAGACCTTTACATTCC 


TG AAG AT AAAC ATGT AATCTTTT ATC TT ATTTTG CT C AAT AAAATTGT TC AG AAG ATC AAAGT 


GGTAAAGACAATGTAAAATTTAACATTTTAATACTGATGTTGTACACTGTTTTACTTAACATT 


TTGGGAAGTAACTGCCTCTGACTTCAACTCAAGAAAACACTTTTTTGTTGCTAATGTAATCGG 


TTTTTGTAATGGCGTCAGCAAATAAAAGGATGCTTATTATTC 




ORF Start: ATG at 41 




ORF Stop: TGA at 3851 




SEQIDNO:64 1270 aa jM W at 146190.7kD 


NOV 18b, 

CGI 11 188-02 Protein 
Sequence 


MEFYESAYFI VLIPS I VI TVI FLFFWLFMKETLYDEVLAKQKREQKLI PTKTDKKKAE KKKNK 

KKEIQNGNLHESDSESVPRDFKLSDALAVEDDQVAPVPLNWETSSSVRERKKKEKKQKPVLE j 

EQVIKESDASKIPGKKVEPVPVTKQPTPPSEAAASKKKPGQKKSKNGSDDQDKKVETLMVPSK \ 

RQEALPLHQETKQESGSGKKKASSKKQKTENVFVDEPLIHATTYIPLMDNADSSPVVDKREVI j 

DLLKPDQVEGIQKSGTKKLKTETDKENAEVKFKDFLLSLKTMMFSEDEALCWDLLKEKSGVI j 

QDALKKSSKGELTTLIHQLQEKDKLLAAVKEDAAATKDRCKQLTQEMMTEKERSNWITRMKD i 

RIGTLEKEHNVFQNKIHVSYQETQQMQMKFQQVREQMEAEIAHLKQENGILRDAVSNTTNQLE 

SKQSAELNKLRQDYARLVNELTEKTGKLQQEEVQKKNAEQAATQLKVQLQEAERRWEEVQSYI 

RKRTAEHEAAQQDLQSKFVAKENEVQSLHSKLTDTLVSKQQLEQRLMQLMESEQKRVNKEESL 

QMQVQDILEQNEALKAQIQQFHSQIAAQTSASVLAEELHKVIAEKDKQIKQTEDSLASERDRL 

TSKEEELKDIQNMNFLLKAEVQKLQALANEQAAAAHELEKMQQSVYVKDDKIRLLEEQLQHEI 

SNKMEEFKILNDQNKALKSEVQKLQTLVSEQPNKDWEQMEKCIQEKDEKLKTVEELLETGLI 

QVATKEEELNAIRTENSSLTKEVQDLKAKQNDQVSFASLVEELKKVIHEKDGKIKSVEELLEA 

ELLKVANKEKTVQDLKQEIKALKEEIGNVOLEKAQQLSITSKVQELQNLLKGKEEQMNTMKAV 

LEEKEKDLANTGKWLQDLQEENESLKAHVQEVAQKNLKEASSASQFEELEIVLKEKENELKRL 

EAMLKERESDLSSKTQLLQDVQDENKLFKSQIEQLKQQNYQQASSFPPHEELLKVISEREKEI 

SGLWNELDSLKDAVEHQRKKNNSYGEWLHGFEKKAKECMAGTSGSEEVKVLEHKLKEADEMHT 

LLQLECEKYKSVLAETEGILQKLQRSVEQEENKWKVKVDESHKTIKQMQSSFTSSEQELERLR 

SENKDIENLRREREHLEMELEKAEMERSTYVTEVRELKAQLNETLTKLRTEQNERQKVAGDLH 

KAQQSLELIQSKIVKAAGDTTVIENSDVSPETESSEKETMSVSLNQTVTQLQQLLQAVNQQLT 

KEKEHYQVLE 




SEQIDNO: 65 


4416 bp 




NOV 18c, 

CGI 11 188-03 DNA 
Sequence 


GCCGCACAGGGTTTTATAGGATCACATTGACAAAAGTACCATGGAGTTTTATGAGTCAGCATA 


TTTT ATTGTTCTT ATTC CTTC AAT AGTTATT AC AGTAATTTT CCTCTTCTTCTG G CT TTT C AT 
GAAAGAAACATTATATGATGAAGTTCTTGCAAAACAGAAAAGAGAACAAAAGCTTATTCCTAC 
CAAAACAGATAAAAAGAAAGCAGAAAAGAAAAAGAATAAAAAGAAAGAAATCCAGAATGGAAA 
CCTCCATGAATCCGACTCTGAGAGTGTACCTCGAGACTTTAAATTATCAGATGCTTTGGCAGT 
AGAAGATGATCAAGTTGCACCTGTTCCATTGAATGTCGTTGAAACTTCAAGTAGTGTTAGGGA 
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I AAGAAAAAAGAAGGAAAAGAAACAAAAGCCTGTGCTTGAAGAGCAGGTCATCAAAGAAAGTGA 
I CGCATCAAAGATTCCTGGCAAAAAAGTAGAACCTGTCCCAGTTACTAAACAGCCCACCCCTCC 
CTCTGAAGCAGCTGCCTCGAAGAAGAAACCAGGGCAGAAGAAGTCTAAAAATGGAAGCGATGA 
CCAGGATAAAAAGGTGGAAACTCTCATGGTACCATCAAAAAGGCAAGAAGCATTGCCCCTCCA 
C C AAG AG ACT AAAC AAG AAAGTGG AT C AGGG AAG AAG AAAG CTTC AT C AAAG AAAC AAAAG AC 
AGAAAATGTCTTCGTAGATGAACCCCTTATTCATGCAACTACTTATATTCCTTTGATGGATAA 
TGCTGACTCAAGTCCTGTGGTAGATAAGAGAGAGGTTATTGATTTGCTTAAACCTGACCAAGT 
AGAAGGGATCCAGAAATCTGGGACTAAAAAACTGAAGACCGAAACTGACAAAGAAAATGCTGA 
AGTG AAGTTT AAAG ATTTTCT TCTGT CCTTG AAG ACT ATG ATGTTTT CTG AAG ATG AGG CTCT 
TTGTGTTGTAGACTTGCTAAAGGAGAAGTCTGGTGTAATACAAGATGCTTTAAAGAAGTCAAG 
T AAGGG AG AATTG ACT ACG CT T ATAC ATC AG CTT C AAG AAAAGG ACAAGTT AC TCG CTG CT G T 
G AAGG AAG ATG C TGCTGCT AC AAAGG AT CGGTGT AAG C AGTT AAC CC AGG AAATG ATG AC AG A 
GAAAGAAAGAAGCAATGTGGTTATAACAAGGATGAAAGATCGAATTGGAACATTAGAAAAGGA 
AC AT AAT GTATTTCAAAA C AAAATAC ATGTC AGT TAT C AAG AG AC TC AAC AG AT G C AG ATG AA 
G T TTC AG C AAGTTCGTG AGC AG ATGG AGG C AG AG AT AG C TC ACTTG AAGC AGG AAAATGG T AT 
ACTGAGAGATGCAGTCAGCAACACTACAAATCAACTGGAAAGCAAGCAGTCTGCAGAACTAAA 
TAAACTACGCCAGGATTATGCTAGGTTGGTGAATGAGCTGACTGAGAAAACAGGAAAGCTACA 
GCAAGAGGAAGTCCAAAAGAAGAATGCTGAGCAAGCAGCTACTCAGTTGAAGGTTCAACTACA 
AGAAGCTGAGAGAAGGTGGGAAGAAGTTCAGAGCTACATCAGGAAGAGAACAGCGGAACATGA 
GG CAG C AC AG CAAG ATTT AC AG AGT AAATTTGTGGC CAAAG AAAATG AAGT AC AG AGT CTG C A 
T AGT AAG CTT AC AG AT AC CTT GG TAT CAAAAC AAC AGTTGG AG C AAAG ACT AATG C AG TT AAT 
GG AATCAG AG CAG AAAAGGGTGAACAAAG AAG AGTCTCTACAAATGC AGGTTCAGG AT ATTTT | 
GG AGCAG AATGAGGCTTTG AAAG CTCAAATTCAG CAGTTCCATTC CC AGATAGCAGCCCAG AC 
CT CCG CTTCAGTTCT AG CAGAAGAATTAC AT AAAGTGATTG CAG AAAAGG AT AAG CAG AT AAA 
ACAGACTGAAGATTCTTTAGCAAGTGAACGTGATCGTTTAACAAGTAAAGAAGAGGAACTTAA 
GGATATACAGAATATGAATTTCTTATTAAAAGCTGAAGTGCAGAAATTACAGGCCCTGGCAAA 
TG AG CAG G CTGCTGCTG C ACATG AATTGG AG AAG ATGC AAC AAAGTGTTT ATG TT AAAG ATG A 
TAAAATAAGATTGCTGGAAGAGCAACTACAACATGAAATTTCAAACAAAATGGAAGAATTTAA 
GATTCTAAATGACCAAAACAAAGCATTAAAATCAGAAGTTCAGAAGCTACAGACTCTTGTTTC 
TGAACAGCCTAATAAGGATGTTGTGGAACAAATGGAAAAATGCATTCAAGAAAAAGATGAGAA 
GTT AAAG ACTGTGG AAG AATT ACTTG AAAC TGGACTTATTCAGGTGG C AACT AAAG AAG AGG A 
GC TG AATG C AAT AAG AAC AG AAAATTC ATCT CTG AC AAAAG AAGTTC AAG ACT T AAAAG CT AA 
GCAAAATGATCAGGTTTCTTTTGCCTCTCTAGTTGAAGAACTTAAGAAAGTGATCCATGAGAA 
AGATGGAAAGATCAAGTCTGTAGAAGAGCTTCTGGAGGCAGAACTTCTCAAAGTTGCTAACAA 
GGAGAAAACTGTTCAGGATTTGAAACAGGAAATAAAGGCTCTAAAAGAAGAAATAGGAAATGT 
CCAGCTTGAAAAGGCTCAACAGTTATCTATCACTTCCAAAGTTCAGGAGCTTCAGAACTTATT 
AAAAGGAAAAGAGGAACAGATGAATACCATGAAGGCTGTTTTGGAAGAGAAAGAGAAAGACCT 
AGCCAATACAGGGAAGTGGTTACAGGATCTT CAAGAAGAAAATGAAT CTTT AAAAG C AC ATGT 
TCAGGAAGTAGCACAACATAACTTGAAAGAGGCCTCTTCTGCATCACAGTTTGAAGAACTTGA 
G ATTG TG TTG AAAG AAAAGG AAAATG AATTG AAG AGG TT AG AAG CC ATG CT AAAAG AG AGGG A 
GAGTGATCTTTCTAGCAAAACACAGCTGTTACAGGATGTACAAGATGAAAACAAATTGTTTAA 
GTCCCAAATTGAGCAGCTTAAACAACAAAACTACCAACAGGCATCTTCTTTTCCCCCTCATGA 
AGAATTATTAAAAGTAATTTCAGAAAGAGAGAAAGAAATAAGTGGTCTCTGGAATGAGTTAGA 
TTCTTTGAAGGATGCAGTTGAACACCAGAGGAAGAAAAACAATGAAAGGCAGCAACAGGTGGA 
AGCTGTTGAGTTGGAGGCTAAAGAAGTTCTCAAAAAATTATTTCCAAAGGTGTCTGTCCCTTC 
TAATTTG AGTT ATGGTGAATGGTTGCATGGATTTGAAAAAAAGGC AAAAG AATGT ATG GCTGG 
AACTTCAGGGTCAGAGGAGGTTAAGGTTCTAGAGCACAAGTTGAAAGAAGCTGATGAAATGCA 
CACATTGTTACAGCTAGAGTGTGAAAAATACAAATCCGTCCTTGCAGAAACAGAAGGAATTTT 
ACAGAAGCTACAGAGAAGTGTTGAGCAAGAAGAAAATAAATGGAAAGTTAAGGTCGATGAATC 
ACACAAGACTATTAAACAGATGCAGTCATCATTTACATCTTCAGAACAAGAGCTAGAGCGATT 
AAG AAG CGAAAATAAGGAT ATTG AAAATCTG AG AAG AG AACG AG AAC ATTTGG AAATGG AA C T 
AGAAAAGGCAGAGATGGAACGATCTACCTATGTTACAGAAGTCAGAGAGTTGAAGGCACAGTT 
AAATG AAACACTCACAAAACTT AG AACTG AAC AAAATG AAAG AC AG AAGGT AG CTGGTG ATTT 
GCATAAGGCTCAACAGTCACTGGAGCTTATCCAGTCAAAAATAGTAAAAGCTGCTGGAGACAC 
TACTGTTATTGAAAATAGTGATGTTTCCCCAGAAACGGAGTCTTCTGAGAAGGAGACAATGTC 
TGTAAGTCTAAATCAGACTGTAACACAGTTACAGCAGTTGCTTCAGGCGGTAAACCAACAGCT 
CACAAAGGAGAAAGAGCACTACCAGGTGTTAGAGTGA AGTAATTGGGAAACTGTTCATTTGAG 
GATAAAAAAGGCATTGTATTATATTTTGCCAAATTAAAGCCTTATTTATGTTTTCACCCTTTC 
TACTTTGTCAGAAACACTGAACAGAGTTTTGTCTTTTCTAATCCTTGTTAGACTACTGATTTA 
AAG AAGG AAAAAAAAAAGCC AAC TCTGT AG ACAC CTTC AG AGTTTAGTTT TAT AAT AAAAACT 
GTTTGAATAATTAGACCTTTACATTCCTGAAGATAAACATGTAATCTTTTATCTTATTTTGCT 
CAATAAAATTGTTCAGAAGATCAAAGTGGTAAAGACAATGTAAAATTTAACATTTTAATACTG 
ATGTTGTACACTGTTTTACTTAACATTTTGGGAAGTAACTGCCTCTGACTTCAACTCAAGAAA 
ACACTTTTTTGTTGCTAATGTAATCGGTTTTTGTAATGGCGTCAGCAAATAAAAGGATGCTTA 
TTATTC 

[ORF Start: ATG at 41 j ]ORF Stop: TCA at 3941 
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SEQIDNO:66 |l300aa 


MWat !49609.6kD 


NOV 18c, 

CGI 11 188-03 Protein 
Sequence 


MEFYESAYFIVLIPSIVITVIFLFFWLFMKETLYDEVLAKQKREQKLIPTKTDKKKAEKKKNK 
KKEIQNGNLHESDSESVPRDFKLSDALAVEDDQVAPVPLNWETSSSVRERKKKEKKQKPVLE 
EQVIKESDASKIPGKKVEPVPVTKQPTPPSEAAASKKKPGQKKSKNGSDDQDKKVETLMVPSK 
RQEALPLHQETKQESGSGKKKASSKKQKTENVFVDEPLIHATTYIPLMDNADSSPWDKREVI 
DLLKPDQVEGIQKSGTKKLKTETDKENAEVKFKDFLLSLKTMMFSEDEALCWDLLKEKSGVI 
QDALKKSSKGELTTLIHQLQEKDKLLAAVKEDAAATKDRCKQLTQEMMTEKERSNWITRMKD 
RIGTLEKEHNVFQNKIHVSYQETQQMQMKFQQVREQMEAEIAHLKQENGILRDAVSOTTNQLE 
SKQSAELNKLRQDYARLVNELTEKTGKLQQEEVQKKNAEQAATQLKVQLQEAERRWEEVQSYI 
RKRTAEHEAAQQDLQSKFVAKENEVQSLHSKLTDTLVSKQQLEQRLMQLMESEQKRVNKEESL 
QMQVQDILEQNEALKAQIQQFHSQIAAQTSASVLAEELHKVIAEKDKQIKQTEDSLASERDRL 
TSKEEELKDI QNMNFLLKAE VQKLQALANEQAAAAHELEKMQQS V YVKDDK I RLLEEQLQH E I 
SNKMEEFKILNDQNKALKSEVQKLQTLVSEQPNKDWEQMEKCIQEKDEKLKTVEELLETGLI 
QVATKEEELNAIRTENSSLTKEVQDLKAKQNDQVSFASLVEELKKVIHEKDGKIKSVEELLEA 
ELLKVANKEKT VQDL KQE I KALKEE I GNVQLE KAQQLS I TS KVQE LQNLL KG K EEQMNTM K A V 
LEEKEKDLANTGKWLQDLQEENESLKAHVQEVAQHNLKEASSASQFEELEIVLKEKENELKRL 
EAMLKERESDLSSKTQLLQDVQDENKLFKSQIEQLKQQNYQQASSFPPHEELLKVISEREKEI 
SGLWNE LD S L KDA VE H QR KKNNE RQQQ VE AVE LE AKE VL K KL F PK VS VP S NLS YG E W LHG F E K 
KAKECMAGTSGSEEVKVLEHKLKEADEMHTLLQLECEKYKSVLAETEGILQKLQRSVEQEENK 
WKVKVDESHKTIKQMQSSFTSSEQELERLRSENKDIENLRREREHLEMELEKAEMERSTYVTE 
VRELKAQLNETLTKLRTEQNERQKVAGDLHKAQQSLELIQSKIVKAAGDTTVIENSDVSPETE 
SSEKETMSVSLNQTVTQLQQLLQAVNQQLTKEKEHYQVLE 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table I8B. 



Table 18B. Comparison of NOV18a against NOV18b and NOV18c. 


Protein Sequence 


NOV18a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV 18b 

i 


1..1328 
1 ..1270 


1041/1329 (78%) 
1044/1329(78%) 


NOV 18c 


1 ..1328 
1 ..1300 


1062/1329 (79%) 
1063/1329 (79%) 



5 

Further analysis of the NOV1 8a protein yielded the following properties shown in 
Table 1 8C. 



j Table 18C. Protein Sequence Properties NOV18a 


PSort 

j analysis: 

! 


0.8200 probability located in endoplasmic reticulum (membrane); 0.1900 probability 
located in plasma membrane; 0.1800 probability located in nucleus; 0.1000 
probability located in endoplasmic reticulum (lumen) 


SignalP 
analysis: 


Cleavage site between residues 40 and 4 1 



1 0 A search of the NOV 1 8a protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table I8D. 
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; Table 18D. Geneseq Results for NOV18a 


'Geneseq 
; Identifier 

I 


Protein/Organism/Length [Patent #, 
Date] 


|NOV18a 
j Residues/ 
Match 
Residues 


l 
> 

I Identities/ 
i Similarities for the 
Matched Region 


Expect 
Value 


ABB57163 

j 


Mouse ischaemic condition related 
protein sequence SEQ ID NO:396 - 
Mus musculus, 1327 aa. 
[ WO200 1 88 1 88-A2, 22-NO V-200 1 ] 


1..1320 
1 .1326 


1072/1354 (79%) 
1183/1354 (87%) 


0.0 

: 

i 


! AAG67538 


Amino acid sequence of a human 

pi 80 protein - Homo sapiens, 1240 aa. 

[ WO200 1 64947-A 1 , 07-SEP-200 1 ] 


50.. 1068 
235.. 1229 


254/1069(23%) 
483/1069(44%) 


le-63 


AAM79523 


Human protein SEQ ID NO 3 169 - 
Homo sapiens, 1 003 aa. 
[WO200I57190-A2, 09-AUG-200I] 


21 ..1068 
51.. 992 


261/1084(24%) 
482/1084(44%) 


3e-62 

i 


AAM78539 


Human protein SEQ ID NO 1 201 - 
Homo sapiens, 977 aa. 
[WO200I57190-A2, 09-AUG-200I] 


21 .. 1068 
25..966 


261/1084(24%) 
482/1084 (44%) 


3e-62 


AAW8972I 


Canine ribosome receptor - Canis 
familiaris, 1484 aa. [ WO990 1 565-A 1 , 
I4-JAN-1999] 


36..982 
51 1..1459 


244/1006(24%) j 
476/1006(47%) 

i 


4e-62 



In a BLAST search of public sequence datbases, the NOV 18a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 18E. 

5 

! Table 18E. Public BLASTP Results for NOVJ8a 



' 

j Protein 
; Accession 
! Number 

i 


Protein/Organism/Length 


NOV18a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


QI4707 


1 56 kDa protein - Homo sapiens 
(Human), 1356 aa. 


1 ..1328 
1 ..1356 


1328/1356 (97%) 
1328/1356(97%) 


0.0 


Q13999 


CGI protein (KIAA0004 protein) - 
Homo sapiens (Human), 1 300 aa. • 


I ..1328 
1 ..1300 


1297/1329 (97%) 
1298/1329 (97%) 


0.0 


1 09796 1 


Kinectin - Vulpes vulpes (Red fox), 
1330 aa. 


1 ..1328 
1 .1330 


1240/1330(93%) 
1284/1330(96%) 


0.0 


Q61595 


Kinectin - Mus musculus (Mouse), 
1327 aa. 


1.1320 
I..1326 


1072/1354 (79%) 
1 183/1354 (87%) 


0.0 


Q9063 1 


Kinectin - Gallus gallus (Chicken), 
1364 aa. 


1 ..1328 
1 ..1364 


889/1370(64%) 
1119/1370(80%) i 


0.0 
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PFam analysis predicts that the NOV 1 8a protein contains the domains shown in the 
Table I8F. 



r ' ' 

j Table 18F. Domain Analysis of NOV18a 



1 — " 

i 




Identities/ 




i Pfam Domain 


NOV18a Match Region 


Similarities 


Expect Value 


i 
! 




for the Matched Region 




1 LBP BPI CETP 


576..608 


8/35 (23%) 


0.27 


j 




25/35 (71%) 





5 Example 19. 

The NOV 19 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 19A. 



jTable 19 A. NOV 19 Sequence Analysis 




SEQlDNO:67 921 bp J 


|NOVI9a, 

CGI 11473-01 DNA 
Sequence 

i 

i 
i 

j 

f 

i 


C AGAC G C CTCCAGG ATCTGTCGG C AGCTG CTGTT CTG AGGGAG AG C AG AG AC C ATGTC TG AC A 


TAGAAGAGGTGGTGGAAGAGTACGAGGAGGAGGAGCAGGAAGAAGCAGCTGTTGAAGAGCAGG 
AGGAGGCAGCGGAAGAGGATGCTGAAGCAGAGGCTGAGACCGAGGAGACCAGGGCAGAAGAAG 
ATGAAGAAGAAGAGGAAGCAAAGGAGGCTGAAGATGGCCCAATGGAGGAGTCCAAACCAAAGC 
CCAGGTCGTTCATGCCCAACTTGGTGCCTCCCAAGATCCCCGATGGAGAGAGAGTGGACTTTG 
ATGACATCCACCGGAAGCGCATGGAGAAGGACCTGAATGAGTTGCAGGCGCTGATTGAGGCTC 
ACTTTGAGAACAGGAAGAAAGAGGAGGAGGAGCTCGTTTCTCTCAAAGACAGGATCGAGAGAC 
GTCGGGCAGAGCGGGCCGAGCAGCAGCGCATCCGGAATGAGCGGGAAAAGAAGAAGAAGATTC 
TGGCTGAGAGGAGGAAGGTGCTGGCCATTGACCACCTGAATGAAGATCAGCTGAGGGAGAAGG 
CCAAGGAGCTGTGGCAGAGCATCTATAACTTGGAGGCAGAGAAGTTCGACCTGCAGGAGAAGT 
TCAAGCAGCAGAAATATGAGATCAATGTTCTCCGAAACAGGATCAACGATAACCAGAAAGTCT 
CCAAGACCCGCGGGAAGGCTAAAGTCACCGGGCGCTGGAAATAGAGCCTGGCCTCCTTCACCA 


! 

i 

i — * 


AAGATCTGCTCCTCGCTCGCACCTGCCTCCGGCCTGCACTCCCCCAGTTCCCGGGCCCTCCTG 


GGCACCCCAGGCAGCTCCTGTTTGGAAATGGGGAGCTGGCCTAGGTGGGAGCCACCACTCCTG 


CCTGCCCCCACACCCACTCCACACCAGTAATAAAAAGCC 


i 


ORF Start: ATG at 54 j jORF Stop: TAG at 735 


i 


SEQ ID NO: 68 227 aa M W at 27 1 75.7RD 


:NOV19a, 

jCGl 11473-01 Protein 
^Sequence 


MSDIEEWEEYEEEEQEEAAVEEQEEAAEEDAEAEAETEETRAEEDEEEEEAKEAEDGPMEES 
KPKPRSFMPNLVPPKIPDGERVDFDDIHRKRMEKDLNELQALIEAHFENRKKEEEELVSLKDR 
IERRRAERAEQQRIRNEREKKKKILAERRKVLAIDHLNEDQLREKAKELWQSIYNLEAEKFDL 
QEKFKQQKYEINVLRNRINDNQKVSKTRGKAKVTGRWK 


i 


SEQ ID NO: 69 11 54 bp 


; NOV19b, 

jCGl 11473-02 DNA 

'Sequence 

j 

i 

i 
! 

: 

i 


CGGCCGCGTCGACAGCAGACGCCTCCAGGATCTGTCGGCAGCTGCTGTTCTGAGGGAGAGCAG 


AGACCATGTCTGACATAdAAGAGGTGGTGGAAGAGTACGAGGAGGAGGAGCAGGAAGAAGCAG 


CTGTTGAAGAGCAGGAGGAGGCAGCGGAAGAGGATGCTGAAGCAGAGGCTGAGACCGAGGAGA 
CCAGGGCAGAAGAAGATGAAGAAGAAGAGGAAGCAAAGGAGGCTGAAGATGGCCCAATGGAGG 
AGTCCAAACCAAAGCCCAGGTCGTTCATGCCCAACTTGGTGCCTCCCAAGATCCCCGATGGAG 
AGAGAGTGGACTTTGATGACATCCACCGGAAGCGCATGGAGAAGGACCTGAATGAGTTGCAGG 
CGCTGATTGAGGCTCACTTTGAGAACAGGAAGAAAGAGGAGGAGGAGCTCGTTTCTCTCAAAG 
ACAGGATCGAGAGACGTCGGGCAGAGCGGGCCGAGCAGCAGCGCATCCGGAATGAGCGGGAGA 
AGGAGCGGCAGAACCGCCTGGCTGAAGAGAGGGCTCGACGAGAGGAGGAGGAGAACAGGAGGA 
AGGCTGAGGATGAGGCCCGGAAGAAGAAGGCTTTGTCCAACATGATGCATTTTGGGGGTTACA 
TCCAGAAGCAGGCCCAGACAGAGCGGAAAAGTGGGAAGAGGCAGACTGAGCGGGAAAAGAAGA 
AG AAG ATTCTGG CTG AG AGG AGG AAGGTG CTGGC C ATTG ACC ACC TG AATG AAG ATC AG CTG A 
GGG AG AAGGCC AAGG AGC TGTGG C AG AGC AT CT AT AACTTGG AGG CAG AG AAGTTCG ACCTG C 
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AGGAGAAGTTCAAGCAGCAGAAATATGAGATCAATGTTCTCCGAAACAGGATCAACGATAACC 
AGAAAGTCTCCAAGACCCGCGGGAAGGCTAAAGTCACCGGGCGCTGGAAATAGAGCCTGGCCT 
CCTTCACCAAAGATCTGCTCCTCGCTCGCACCTGCCTCCGGCCTGCACTCCCCCAGTTCCCGG 


GCCCTCCTGGGCACCCCAGGCAGCTCCTGTTTGGAAATGGGGAGCTGGCCTAGGTGGGAGCCA 


CCACTCCTGCCTGCCCCCACACCCACTCCACACCAGTAATAAAAAGCCACCACACACAAAAAA 


AAAAAAAAAAAAACCCAAAA 




ORF Start: ATG at 69 


jORFStop: TAG at 933 




SEQ ID NO: 70 


288 aa jMW at 34589.9kD 


NOV 19b, 

CGI 1 1473-02 Protein 
Sequence 


MSDIEEWEEYEEEEQEEAAVEEQEEAAEEDAEAEAETEETRAEEDEEEEEAKEAEDGPMEES 
KPKPRSFMPNLVPPKIPDGERVDFDDIHRKRMEKDLNELQALIEAHFENRKKEEEELVSLKDR 
IERRRAERAEQQRIRNEREKERQNRLAEERARREEEENRRKAEDEARKKKALSNMMHFGGYIQ 
KQAQTERKSGKRQTEREKKKKILAERRKVLAIDHLNEDQLREKAKELWQSIYNLEAEKFDLQE 
KFKQQKYEI NVLRNR I NDNQK VS KT RG KAKVTGR W K 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 19B. 



Table 19B. Comparison of NOV19a against NOV19b. 


Protein Sequence 


NOV19a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NO VI 9b 


I28..227 
I90..288 


88/100 (88%) 
93/100 (93%) 



5 

Further analysis of the NOV19a protein yielded the following properties shown in 
Table 19C. 



Table 19C. Protein Sequence Properties NOV19a 


PSort 
analysis: 


0.9725 probability located in nucleus; 0.1000 probability located in mitochondrial 
matrix space; 0.1000 probability located in lysosome (lumen); 0.0000 probability 
located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



10 A search of the NOV 1 9a protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table I9D. 
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Table 19D. Geneseq Results for NOVI9a 


Geneseq 
Identifier 


1 

Protein/Organism/Lcngth [Patent #, 
Date] 


NOV19a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAY91088 


Recombinant modified human cardiac 
troponin T SEQ ID NO:6 - Homo 
sapiens, 288 aa. [US6060278-A, 09- 
MAY-2000] 


1..227 
I..288 


221/288(76%) 
226/288 (77%) 


e-1 13 


AAB12186 


Human troponin T cardiac isoform 
(cTnT) - Homo sapiens, 288 aa. 
[US6072040-A, 06-JUN-2000] 


I..227 
I..288 


221/288(76%) 
226/288 (77%) 


e-113 


AAW72759 


Recombinant human cardiac troponin T 
- Homo sapiens, 288 aa. [US5834210- 
A, 10-NOV-1998] 


1..227 
I..288 


221/288(76%) 
226/288 (77%) 


e-113 


A A W4 1574 


Human cardiac troponin T isoform T3 - 
Homo sapiens, 288 aa. [W09739132- 
Al,23-OCT-l997] 


1..227 
1..288 


221/288(76%) 
226/288 (77%) 


e-113 


AAW76640 


Human cardiac HcTnT protein deletion 
mutant delta S275-K288 - Homo 
sapiens, 274 aa. [DE19815 128-A 1 , 08- 
OCT-1998] 


1..213 
1 ..274 


205/274 (74%) 
212/274 (76%) 


e-1 04 



In a BLAST search of public sequence datbases, the NOV 19a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 1 9E. 



Table 19E. Public BLASTP Results for NOV19a 


Protein 

Accession 

Number 


Protein/Organism/Lcngth 


NOV19a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 

i 


Q9BUF6 


Similar to troponin T2, cardiac - 
Homo sapiens (Human), 285 aa. 


I..227 
I..285 


224/285 (78%) 
225/285 (78%) 


e-1 14 


TPHUTC 


troponin T, cardiac muscle - 
human, 298 aa. 


1 ..227 
1 ..298 


221/298 (74%) 
226/298 (75%) 


e-MO 


AAK92232 

1 


Truncated cardiac troponin T - 
Homo sapiens (Human), 274 aa. 


I..2I3 
1..274 


205/274 (74%) 
212/274 (76%) 


e-1 03 


A25345 


troponin T, cardiac muscle, major 
isoform - rabbit, 276 aa. 


1.211 
I..276 


202/278 (72%) 
212/278 (75%) 


e-1 03 


B25345 


troponin T, cardiac muscle, minor 
isoform - rabbit, 276 aa. 


2.221 
1..276 


197/278 (70%) 
206/278 (73%) 


3e-99 
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PFam analysis predicts that the NOV 1 9a protein contains the domains shown in the 
Table I9F. 



Table 19F. Domain Analysis of NOV19a 


Pfam Domain 


NOV19a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


Troponin 


93..227 

J 


36/194 (19%) 
105/194 (54%) 


0.0067 



5 Example 20. 

The NOV20 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 20A. 



Table 20A. NOV20 Sequence Analysis 

jSEQIDNO:71 ~ "" 75332 bp j _ 

NOV20a, [ AGGCTCCCAATCCCATCCTCATCTCTGCCCCTTCTTCTCAGAAGG ATQGCCGACACCCAGACA 

CGI 1 1501-01 DNA CAGGTGGCCCCCACACCAACCATGAGGATGGC ^ 

CCAGCCCTGGAGGACCTGCCACTGCCGCCACCCAAGGAATCCTTCTCCAAGTTCCATCAGCAG 

sequence iCGGCaagctagtgagctccgccgcctctacaggcacatccaccctgagctccgcaagaatctg 
]gctgaggctgtggccgaggatctggctgaggtcctgggctctgaggaacccaccgagggtgac 
,gttcagtgcatgcgctggatctttgagaactggagactggatgccattggagaacacgagagg 
-ccagctgccaaggagcccgtgctgtgtggtgacgtccaggccacctcccgcaagtttgaggaa 
^ggctcctttgccaacagcacagaccaggagccaaccaggccccagccaggtggaggagacgtt 
^cgtgcagcccgctggctatttgagacaaagccactggacgagctgacagggcaagccaaggaa 
jctggaggccactgtgagggagcctgcagccagcggagatgtgcagggtaccaggatgctcttt 
jg ag acg cgg c cg ctgg accgc ctgggctc ccg cc c ct c c ctg c agg agc ag ag cc c c t tgg aa 

ctgcgctcagagatccaggagctgaagggtgatgtgaaaaagacagtgaagctcttccaaacg 
.gagcccctgtgtgccatccaggatgcagagggcgccatccatgaggtcaaggccgcatgccgg 

gaggagatccaaagcaacgcggtgaggtctgcccgctggctctttgagacccggcctctggac 
jgccatcaaccaggaccccagccaggtgcgggtgatccgggggatttccctggaggagggggcc 
jcggcccgacgtcagtgcaactcgctggatctttgagacacagcccctggatgccatccgggag 

atcttggtagatgagaaggacttccagccatccccagaccttatcccacctggtccagatgtt 
jcagcagcagcagcatctgtttgagacccgagcgctggacactctgaagggggacgaagaggct 
\ gg agc ag agg cccc ac ccaagg agg aagtggtccctgg tgatgt c cg ctcc ac cctgt gg ct a 
j tttg aaac aaag cccctg g atg ctttc ag ag ac aagg tcc aagtgggtc ac ct ac ag c g agtg 

iGATCCCCAGGACGGTGAGGGGCATCTATCCAGTGACAGCTCCTCAGCACTGCCCTTCTCTCAG 
jAGTGCCCCCCAGAGGGATGAGCTAAAGGGGGATGTGAAGACTTTTAAGAACCTTTTTGAGACC 
j CTTCC CT TGG AC AG C ATTGG AC AGGG TG AGG TTCTGG C C C ATGGG AGTCC AAG C AG AG AAG AA 
jGGAACTGATTCTGCTGGGCAGGCCCAGGGCATAGGGTCCCCAGTGTATGCCATGCAGGACAGC 
•AAGGGCCGCCTCCATGCCCTGACCTCTGTTAGCAGAGAGCAGATAGTCGGAGGTGATGTGCAG 
GGCTACAGGTGGATGTTTGAGACACAGCCCCTAGACCAGCTCGGCCGAAGCCCCAGTACCATC 
GACGTGGTGCGGGGCATCACCCGGCAGGAAGTGGTGGCTGGGGACGTTGGCACAGCTCGGTGG 
CTTTTTGAGACCCAGCCCCTGGAGATGATCCACCAACGGGAGCAGCAGGAACGACAGAAAGAA 
G AAGG G AAG AGTC AGGG AG AC C C C C AG CCTG AGG C A C CC C C AAAGGG CG ATGTGC AG A C C AT C 
CGGTGGTTGTTCGAGACTTGCCCAATGAGTGAGTTGGCCGAAAAGCAGGGGTCAGAGGTCACA 
GATCCCACAGCCAAGGCTGAGGCACAGTCCTGCACCTGGATGTTCAAGCCCCAACCTGTGGAC 
AGGCCAGTGGGCTCCAGGGAGCAGCACCTGCAGGTTAGCCAGGTCCCGGCTGGGGAAAGACAG 
ACAGACAGACACGTCTTTGAGACCGAGCCTCTTCAGGCCTCAGGCCGTCCCTGTGGAAGACGG 
CCTGTGAGATACTGCAGCCGCGTGGAGATCCCTTCAGGGCAGGTGTCTCGTCAGAAAGAGGTT 
JTTTCAGGCCCTGGAGGCAGGCAAGAAGGAAGAACAGGAGCCCCGGGTAATCGCTGGGTCCATC 
^CCCGCGGGTTCTGTCCACAAGTTCACTTGGCTTTTTGAGAATTGTCCCATGGGCTCCCTGGCA 
jGCTGAGAGCATCCAAGGGGGCAACCTCCTGGAAGAGCAGCCCATGAGCCCCTCAGGCAACAGG 
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ATGCAAGAGAGCCAGGAGACTGCAGCTGAGGGGACCCTGCGGACTCTGCATGCCACACCTGGC 1 

ATCCTGCACCATGGAGGCATCCTCATGGAGGCCCGAGGGCCAGGGGAGCTCTGTCTTGCCAAG 

TATGTGCTCTCGGGCACAGGGCAGGGGCACCCTTATATACGAAAGGAGGAGCTGGTGTCAGGT 

GAACTTCCCAGGATCATCTGCCAAGTCCTGCGCCGGCCAGATGTGGACCAGCAGGGGCTGCTG 

GTGCAGGAAGACCCAACTGGCCAGCTCCAACTCAAGCCGCTGAGGCTGCCAACTCCAGGCAGC 

AGTGGGAATATTGAAGACATGGACCCTGAGCTCCAGCAGCTGCTGGCTTGCGGTCTTGGGACC 

TCCGTGGCAAGGACTGGGCTGGTGATGCAGGAGACAGAGCAGGGCCTGGTCGCACTGACTGCC 

TACTCTCTGCAGCCCCGGCTAACTAGCAAGGCCTCTGAGAGGAGCAGCGTGCAGCTGTTGGCC 

AGCTGCAT AG ATAAAGG AG ACC TG AGTGG CCTGC AC AGTCTG CGGTGGG AG CC CC CGG CTG AC 

CCGAGTCCAGTGCCAGCCAGCGAGGGGGCCCAGAGCCTGCACCCAACTGAGAGCATCATCCAT 

GTTCCCCCACTGGACCCCAGCATGGGGATGGGGCATCTGAGAGCCTCAGGGGCCACCCCTTGC 

CCTCCTCAGGCCATTGGAAAGGCAGTCCCTCTGGCTGGGGAAGCTGCAGCACCAGCCCAATTG 

CAAAACACAGAAAAGCAGGAAGACAGTCACTCTGGACAGAAAGGGATGGCAGTCTTGGGAAAG 

TCAGAAGGAGCCACGACTACCCCTCCGGGGCCTGGGGCCCCAGACCTCCTGGCCGCCATGCAG 

AG TCTGCGG ATGG C AAC AGCTG AAG CCCAG AG CC TGC ACCAG C AAGTTCTG AA C AAG C AC AAG 

CAGGGCCCCACCCCAACAGCCACTTCCAACCCCATCCAGGACGGTCTTCGGAAAGCTGGGGCT 

AC CC AAAG CAACAT AAGG CCTGGGGG TGG AAGTG ATCCC CGG ATC C C AG C AG C CC C C AG AAAG 

GTCAGTCCTGACTTTCCAGCTGGAGCCCACCGTGCTGAGGACTCCATCCAGCAAGCCTCTGAG 

CCCCTGAAGGACCCCCTTCTTCACTCCCACAGCAGCCCTGCTGGCCAGAGAACCCCTGGAGGG 

TCACAGACAAAGACCCCAAAACTGGACCCCACCATGCCCCCAAAGAAGAAGCCGCAGCTGCCC 

CCTAAACCTGCACACCTAACCCAGAGCCACCCTCCTCAGAGGCTGCCCAAGCCCTTGCCTCTA 

TCTCCCAGCTTTTCCTCGGAGGTGGGGCAAAGAGAACACCAACGAGGTGAGAGAGATACAGCC 

ATCCCTCAGCCAGCCAAGGTTCCCACTACTGTAGACCAGGGCCACATACCTCTGGCCAGATGT 

CCCAGTGGACATAGCCAGCCCAGCTTACAACATGGCCTCAGCACCACGGCCCCCAGGCCCACC 

AAGAATCAGGCTACAGGCAGCAATGCCCAGAGCTCTGAGCCCCCCAAGCTCAATGCCCTCAAC 

CATGATCCCACCTCACCACAGTGGGGCCCCGGCCCCTCAGGAGAGCAGCCCATGGAAGGTTCC 

CACCAAGGGGCCCCTGAGAGCCCTGACAGTCTGCAAAGAAACCAGAAAGAGCTCCAGG'^CCTC 

CTGAACCAGGTGCAAGCCCTGGAGAAGGAGGCCGCAAGCAGTGTGGACGTGCAGGCCCTGCGG 

AGGCTCTTTGAGGCCGTGCCCCAGCTGGGAGGGGCTGCTCCTCAGGCTCCTGCTGCCCACCAA 

AAGCCCGAGGCCTCAGTGGAGCAGGCCTTTGGGGAGCTGACACGGGTCAGCACGGAAGTTGCT 

CAACTGAAGGAACAGACCTTGGCAAGGCTGCTGGACATTGAAGAGGCTGTGCACAAGGCACTC 

AG CTCC ATGT CT AGC CTC C AGCCTGAGGCC AGTG CC AG AGG CCATTT CCAGGG AC CT CCAAAA 

GACCACAGTGCCCACAAGATCAGTGTCACAGTCAGCAGTAGCGCCAGGCCCAGTGGCTCAGGC 

CAGGAGGTCGGAGGTCAAACTGCAGTCAAGAACCAAGCCAAGGTTGAATGCCACACTGAGGCC 

CAGAGTCAAGTCAAGATCAGAAATCACACAGAGGCCAGAGGTCACACAGCCTCAACTGCCCCT 

TCCACCAGGAGGCAGGAGACATCAAGAGAGTATTTGTGCCCTCCTCGGGTTTTACCTTCCAGC 

CGAGATTCTCCCTCCTCCCCAACATTTATCTCCATCCAGTCGGCCACAAGGAAGCCTCTAGAG 

ACTCCC AG CTTT AAGGGC AACCCTG ATGT CT C AG TGAAAAGC AC AC AACTGG CT C AGG AC AT A 

GGCCAGGCCCTGCTCCACCAGAAAGGTGTCCAAGACAAAACTGGGAAGAAGGACATCACCCAG 

TGCTCTGTGC AAC CTGAACCTG CCCCTCCCT C AG CC AGT CCCCTG CCCAG AGGG T GG C AAAAG 

AGTGTTCTGGAGCTACAGACGGGGCCAGGGAGCTCACAACACTATGGAGCCATGAGAACCGTG 

ACTG AACAGT ATG AGG AGGTGG AC C AGTTTGGG AAC AC AGT CCTC ATGTC TTCC A CC AC AG T C 

ACCGAGCAGGCAGAGCCACCCAGGAACCCAGGCTCCCACCTCGGGCTCCACGCCTCCCCCTTG 

CTGAGGCAGTTCCTGCACAGCCCAGCTGGGTTCAGCAGTGACCTGACAGAAGCTGAGACGGTG 

CAGGTGTCCTGCAGCTACTCCCAGCCAGCTGCCCAGTGAGGCCCACCGCCTCCCACCACACCT 



GCCACCTGTTCCTGGCCTCCACTGCCCCAGGACTGAAGTGGGTACCTGCCTCCTGTACACTGG 



AGCAAGGACCAAGAGGAAATGGCATCTTCAGAGGATTACTGTGGGCCATTTCCCTTTCGCAGT 



TCTTTCAATAGGCCCAGTTCTTCCAAATGGAAAAAGAAAGGTCTGGAAGAGGCCCACAGAGTT 
GCACAGGCGTGGGGGTAGGATGGGGGC 



ORF Start: ATG at 46 



ORF Stop: TGA at 5 140 



SEQ ID NO: 72 



1698 aa 



MWat 183686.6kD 



NOV20a, 

CGI II 501-01 Protein 
Sequence 



MADTQTQVAPTPTMRMATAEDLPLPPPPALEDLPLPPPKESFSKFHQQRQASELRRLYRHIHP 
ELRKNLAEAVAEDLAEVLGSEEPTEGDVQCMRWIFENWRLDAIGEHERPAAKEPVLCGDVQAT 
SRKFEEGSFANSTDQEPTRPQPGGGDVRAARWLFETKPLDELTGQAKELEATVREPAASGDVQ 
GTRMLFETRPLDRLGSRPSLQEQSPLELRSEIQELKGDVKKTVKLFQTEPLCAIQDAEGAIHE 
VKAACREEIQSNAVRSARWLFETRPLDAINQDPSQVRVIRGISLEEGARPDVSATRWIFETQP 
LDAIREILVDEKDFQPSPDLIPPGPDVQQQQHLFETRALDTLKGDEEAGAEAPPKEEVVPGDV 
RSTLWLFETKPLDAFRDKVQVGHLQRVDPQDGEGHLSSDSSSALPFSQSAPQRDELKGDVKTF 
KNLFETLPLDSIGQGEVLAHGSPSREEGTDSAGQAQGIGSPVYAMQDSKGRLHALTSVSREQI 
VGGDVQGYRWMFETQPLDQLGRSPSTIDWRGITRQEWAGDVGTARWLFETQPLEMIHQREQ 
QERQKEEGKSQGDPQPEAPPKGDVQTIRWLFETCPMSELAEKQGSEVTDPTAKAEAQSCTWMF 
KPQPVDRPVGSREQHLQVSQVPAGERQTDRHVFETEPLQASGRPCGRRPVRYCSRVEIPSGQV 
SRQKEVFQALEAGKKEEQEPRVIAGSIPAGSVHKFTWLFENCPMGSLAAESIQGGNLLEEQPM 
SPSGNRMQESQETAAEGTLRTLHATPGILHHGGILMEARGPGELCLAKYVLSGTGQGHPYIRK 
EELVSGELPRIICQVLRRPDVDQQGLLVQEDPTGQLQLKPLRLPTPGSSGNIEDMDPELQQLL 



160 



WO 03/023002 ~CT/US02/28539 



ACGLGTS V ARTGL VMQETEQGL V ALT AYS LQPRLTS KAS ERS S VQLLASC I DKGDLSG LHS L R ] 

WEPPADPSPVPASEGAQSLHPTESIIHVPPLDPSMGMGHLRASGATPCPPQAIGKAVPLAGEA j 

AAPAQLQNTEKQEDSHSGQKGMAVLGKSEGATTTPPGPGAPDLLAAMQSLRMATAEAQSLHQQ " 

VLNKHKQGPTPTATSNPIQDGLRKAGATQSNIRPGGGSDPRIPAAPRKVSPDFPAGAHRAEDS 

IQQASEPLKDPLLHSHSSPAGQRTPGGSQTKTPKLDPTMPPKKKPQLPPKPAHLTQSHPPQRL 

PKPLPLSPSFSSEVGQREHQRGERDTAIPQPAKVPTTVDQGHIPLARCPSGHSQPSLQHGLST 

TAPRPTKNQATGSNAQS S E PP KLNALNHDPTS PQWGPGPSGEQPMEGSHQGAPES PDS LQRNQ 

KELQGLLNQVQALEKEAASSVDVQALRRLFEAVPQLGGAAPQAPAAHQKPEASVEQAFGELTR 

VSTEVAQLKEQTLARLLDIEEAVHKALSSMSSLQPEASARGHFQGPPKDHSAHKISVTVSSSA 

RPSGSGQEVGGQTAVKNQAKVECHTEAQSQVKIRNHTEARGHTASTAPSTRRQETSREYLCPP 

RVLPSSRDSPSSPTFISIQSATRKPLETPSFKGNPDVSVKSTQLAQDIGQALLHQKGVQDKTG 

KKDITQCSVQPEPAPPSASPLPRGWQKSVLELQTGPGSSQHYGAMRTVTEQYEEVDQFGNTVL 

MSSTTVTEQAEPPRNPGSHLGLHASPLLRQFLHSPAGFSSDLTEAETVQVSCSYSQPAAQ 



SEQ ID NO: 73 



|3333 bp 



NOV20b, 

CGI 11501-02 DNA 
Sequence 



AGGCTCCCAATCCCATCCTCATCTCTGCCCCTTCTTCTCAGAAGGATGGCCGACACCCAGACA 



CAGGTGGCCCCCACACCAACCATGAGGATGGCAACTGCAGAGGACCTGCCCCTCCCTCCACCC 
CCAGCCCTGGAGGACCTGCCACTGCCGCCACCCAAGGAATCCTTCTCCAAGTTCCATCAGCAG 
CGG C AAG CT AGTG AGCTCCG C CGCCTCT AC AGGC AC ATC C ACCCTGAGCT CCG C AAG AATC TG 
GCTGAGGCTGTGGCCGAGGATCTGGCTGAGGTCCTGGGCTCTGAGGAACCCACCGAGGGTGAC 
GTTCAGTGCATGCGCTGGATCTTTGAGAACTGGAGACTGGATGCCATTGGAGAACACGAGAGG 
CCAGCTGCCAAGGAGCCCGTGCTGTGTGGTGACGTCCAGGCCACCTCCCGCAAGTTTGAGGAA 
GGCTCCTTTGCCAACAGCACAGACCAGGAGCCAACCAGGCCCCAGCCAGGTGGAGGAGACGTT 
CGTGCAGCCCGCTGGCTATTTGAGACAAAGCCACTGGACGAGCTGACAGGGCAAGCCAAGGAA 
CTGGAGGCCACTGTGAGGGAGCCTGCAGCCAGCGGAGATGTGCAGGGTACCAGGATGCTCTTT 
GAGACGCGGCCGCTGGACCGCCTGGGCTCCCGCCCCTCCCTGCAGGAGCAGAGCCCCTTGGAA 
CTG CG CT CAG AG ATCC AGG AG CTG AAG GG TG ATG TG AAAAAG ACAGTG AAGC TC TT C C AAACG 
GAGCCCCTGTGTGCCATCCAGGATGCAGAGGGCGCCATCCATGAGGTCAAGGCCGCATGCCGG 
GAGGAGATCCAAAGCAACGCGGTGAGGTCTGCCCGCTGGCTCTTTGAGACCCGGCCTCTGGAC 
GCCATCAACCAGGACCCCAGCCAGGTGCGGGTGATCCGGGGGATTTCCCTGGAGGAGGGGGCC 
CGGCCCGACGTCAGTGCAACTCGCTGGATCTTTGAGACACAGCCCCTGGATGCCATCCGGGAG 
ATCTTGGTAGATGAGAAGGACTTCCAGCCATCCCCAGACCTTATCCCACCTGGTCCAGATGTT 
CAGCAGCAGCAGCATCTGTTTGAGACCCGAGCGCTGGACACTCTGAAGGGGGACGAAGAGGCT 
GG AGC AG AGG CCCC ACCC AAGG AGG AAGTGGTCCCTGGTG ATGTC CG CTCC AC C CTGTGG CTA 
TTTGAAACAAAGCCCCTGGATGCTTTCAGAGACAAGGTCCAAGTGGGTCACCTACAGCGAGTG 
GATCCCCAGGACGGTGAGGGGCATCTATCCAGTGACAGCTCCTCAGCACTGCCCTTCTCTCAG 
AGTGCCCCCCAGAGGGATGAGCTAAAGGGGGATGTGAAGACTTTTAAGAACCTTTTTGAGACC 
CTTCCCTTGGACAGCATTGGACAGGGTGAGGTTCTGGCCCATGGGAGTCCAAGCAGAGAAGAA 
GGAACTGATTCTGCTGGGCAGGCCCAGGGCATAGGGTCCCCAGTGTATGCCATGCAGGACAGC 
AAGGGCCGCCTCCATGCCCTGACCTCTGTTAGCAGAGAGCAGATAGTCGGAGGTGATGTGCAG 
GGCTACAGGTGGATGTTTGAGACACAGCCCCTAGACCAGCTCGGCCGAAGCCCCAGTACCATC 
GACGTGGTGCGGGGCATCACCCGGCAGGAAGTGGTGGCTGGGGACGTTGGCACAGCTCGGTGG 
CTTTTTGAGACCCAGCCCCTGGAGATGATCCACCAACGGGAGCAGCAGGAACGACAGAAAGAA ! 
GAAGGGAAGAGTCAGGGAGACCCCCAGCCTGAGGCACCCCCAAAGGGCGATGTGCAGACCATC j 
CGGTGGTTGTTCGAGACTTGCCCAATGAGTGAGTTGGCCGAAAAGCAGGGGTCAGAGGTCACA i 
GATCCCACAGCCAAGGCTGAGGCACAGTCCTGCACCTGGATGTTCAAGCCCCAACCTGTGGAC j 
AGGCCAGTGGGCTCCAGGGAGCAGCACCTGCAGGTTAGCCAGGTCCCGGCTGGGGAAAGACAG j 
ACAGACAGACACGTCTTTGAGACCGAGCCTCTTCAGGCCTCAGGCCGTCCCTGTGGAAGACGG 
CCTGTGAGATACTGCAGCCGCGTGGAGATCCCTTCAGGGCAGGTGTCTCGTCAGAAAGAGGTT 
TTTCAGGCCCTGGAGGCAGGCAAGAAGGAAGAACAGGAGCCCCGGGTAATCGCTGGGTCCATC 
CCCGCGGGTTCTGTCCACAAGTTCACTTGGCTTTTTGAGAATTGTCCCATGGGCTCCCTGGCA 
GCTGAGAGCATCCAAGGGGGCAACCTCCTGGAAGAGCAGCCCATGAGCCCCTCAGGCAACAGG 
ATGCAAGAGAGCCAGGAGACTGCAGCTGAGGGGACCCTGCGGACTCTGCATGCCACACCTGGC 
ATCCTGCACCATGGAGGCATCCTCATGGAGGCCCGAGGGCCAGGGGAGCTCTGTCTTGCCAAG 
TATGTGCTCTCGGGCACAGGGCAGGGGCACCCTTATATACGAAAGGAGGAGCTGGTGTCAGGT 
GAACTTCCCAGGATCATCTGCCAAGTCCTGCGCCGGCCAGATGTGGACCAGCAGGGGCTGCTG 
GTGCAGGAAGACCCAACTGGCCAGCTCCAACTCAAGCCGCTGAGGCTGCCAACTCCAGGCAGC 
j AGTGGGAATATTGAAGACATGGACCCTGAGCTCCAGCAGCTGCTGGCTTGCGGTCTTGGGACC 
TCCGTGGC AAGG ACTGGG CTGGTG ATGCAGG AG A C AG AGCAGGGC CTGGTCG C AC TG ACTG CC 
TACTCTCTGCAGCCCCGGCTAACTAGCAAGGCCTCTGAGAGGAGCAGCGTGCAGCTGTTGGCC 
AGCTGCATAGATAAAGGAGACCTGAGTGGCCTGCACAGTCTGCGGTGGGAGCCCCCGGCTGAC 
CCGAGTCCAGTGCCAGCCAGCGAGGGGGCCCAGAGCCTGCACCCAACTGAGAGCATCATCCAT 
GTTCCCCCACTGGACCCCAACAGCCACTTCCAACCCCATCCAGGACGGTCTTCGGAAAGCTGG 
GGCTACCCAAAGCAACATAAGGCCTGGGGGTGGAAGTGA TCCCCGGATCCCAGCAGCCCCCAG 
AAAGCTGCTGTGACAGGACCTGACTTTCCAGCTGGAGCCCACCGTGCTGAGGACTCCATCCAG 
CAAGCCTCTGAGCCCCTGAAGGACCCCCTTCTTCACTCCCACAGCAGCCCTGCTGGCCAGAGA 
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r CT/US02/28539 





ACCCCTGGAGGGTCACAGACAAAGACCCCAAAACTGGACCCCACCATGCCCCCAAAGAAGAAG 




CCGCAGCTGCCCCCTATATCTGCACACCTAACCCAGAGCCCCCCTCCTCAGAGGCTG 


i 

i 


ORF Start: ATG at 46 


1 


jORF Stop: TGA at 3061 


I 

1 
j 


SEQIDNO:74 j 1 005 aa 


iMWat 110888.2kD 


!NOV20b, 

!CG 111501-02 Protein 
sequence 


MADTQTQVAPTPTMRMATAEDLPLPPPPALEDLPLPPPKESFSKFHQQRQASELRRLYRHIHP 
ELRKNLAEAVAEDLAEVLGSEEPTEGDVQCMRWIFENWRLDAIGEHERPAAKEPVLCGDVQAT 
SRKFEEGSFANSTDQEPTRPQPGGGDVRAARWLFETKPLDELTGQAKELEATVREPAASGDVQ 
GTRMLFETRPLDRLGSRPSLQEQSPLELRSEIQELKGDVKKTVKLFQTEPLCAIQDAEGAIHE 
VKAACREEIQSNAVRSARWLFETRPLDAINQDPSQVRVI RGISLEEGARPDVS ATRWI FETQP 
LDAIREILVDEKDFQPSPDLIPPGPDVQQQQHLFETRALDTLKGDEEAGAEAPPKEEWPGDV 
RSTLWLFETKPLDAFRDKVQVGHLQRVDPQDGEGHLSSDSSSALPFSQSAPQRDELKGDVKTF 
KNLFETLPLDSIGQGEVLAHGSPSREEGTDSAGQAQGIGSPVYAMQDSKGRLHALTSVSREQI 
VGGDVQGYRWMFETQPLDQLGRSPSTIDWRGITRQEWAGDVGTARWLFETQPLEMIHQREQ 
QERQKEEGKSQGDPQPEAPPKGDVQTIRWLFETCPMSELAEKQGSEVTDPTAKAEAQSCTWMF 
KPQPVDRPVGSREQHLQVSQVPAGERQTDRHVFETEPLQASGRPCGRRPVRYCSRVEIPSGQV 
SRQKEVFQALEAGKKEEQEPRVIAGSIPAGSVHKFTWLFENCPMGSLAAESIQGGNLLEEQPM 
SPSGNRMQESQETAAEGTLRTLHATPGILHHGGILMEARGPGELCLAKYVLSGTGQGHPYIRK 
EELVSGELPRIICQVLRRPDVDQQGLLVQEDPTGQLQLKPLRLPTPGSSGNIEDMDPELQQLL 
ACGLGTSVARTGLVMQETEQGLVALTAYSLQPRLTSKASERSSVQLLASCIDKGDLSGLHSLR 
WEPPADPSPVPASEGAQSLHPTESIIHVPPLDPNSHFQPHPGRSSESWGYPKQHKAWGWK 


i 


SEQIDNO:75 


1819 bp | 


jNOV20c, 
249257832 DNA 
jSequence 


CACCAAGCTTATGGCCGACACCCAGACACAGGTGGCCCCCACACCAACCATGAGGATGGCAAC 
TGCAGAGGACCTGCCCCTCCCTCCACCCCCAGCCCTGGAGGACCTGCCACTGCCGCCACCCAA 
GGAATCCTTCTCCAAGTTCCATCAGCAGCGGCAAGCTAGTGAGCTCCGCCGCCTCTACAGGCA 
CATCCACCCTGAGCTCCGCAAGAATCTGGCTGAGGCTGTGGCCGAGGATCTGGCTGAGGTCCT 
GGGCT CTG AGG AACCC AC CG AGGGTG ACG TTC AGTG C ATGCGCTGG ATCT TTG AG AAC TGG AG 
ACTGGATGCCATTGGAGAACACGAGAGGCCAGCTGCCAAGGAGCCCGTGCTGTGTGGTGACGT 
CCAGGCCACCTCCCGCAAGTTTGAGGAAGGCTCCTTTGCCAACAGCACAGACCAGGAGCCAAC 
CAGG C CC CAG CCAGGTGG AGG AG ACGTTCGT GC AGC CCG CTGGCT AT TTG AG AC AAAG C C ACT 
GGACGAGCTGACAGGGCAAGCCAAGGAACTGGAGGCCACTGTGAGGGAGCCTGCAGCCAGCGG 
AGATGTGCAGGGTACCAGGATGCTCTTTGAGACGCGGCCGCTGGACCGCCTGGGCTCCCGCCC 
CTCCCTGCAGGAGCAGAGCCCCTTGGAACTGCGCTCAGAGATCCAGGAGCTGAAGGGTGATGT 
GAAAAAGACAGTGAAGCTCTTCCAAACGGAGCCCCTGTGTGCCATCCAGGATGCAGAGGGCGC 
CATCCATGAGGTCAAGGCCGCATGCCGGGAGGAGATCCAAAGCAACGCGGTGAGGTCTGCCCG 
CTGGCTCTTTGAGACCCGGCCTCTGGACGCCATCAACCAGGACCCCAGCCAGGTGCGGGTGAT 
CCGGGGGATTTCCCTGGAGGAGGGGGCCCGGCCCGACGTCAGTGCAACTCGCTGGATCTTTGA 
G AC AC AG CCCCTGG ATGC C ATCCGGG AG ATC TTGGT AG ATG AG AAGG ACTTCC AG C C A TC C CC 
AGACCTTATCCCACCTGGTCCAGATGTTCAGCAGCAGCGGCATCTGTTTGAGACCCGAGCGCT 
GGACACTCTGAAGGGGGACGAAGAGGCTGGAGCAGAGGCCCCACCCAAGGAGGAAGTGGTCCC 
TGGTGATGTCCGCTCCACCCTGTGGCTATTTGAAACAAAGCCCCTGGATGCTTTCAGAGACAA 
GGTCCAAGTGGGTCACCTACAGCGAGTGGATCCCCAGGACGGTGAGGGGCATCTATCCAGTGA 
CAGCTCCTCAGCACTGCCCTTCTCTCAGAGTGCCCCCCAGAGGGATGAGCTAAAGGGGGATGT 
GAAGACTTTTAAGAACCTTTTTGAGACCCTTCCCTTGGACAGCATTGGACAGGGTGAGGTTCT 
GGCCCATGGGAGTCCAAGCAGAGAAGAAGGAACTGATTCTGCTGGGCAGGCCCAGGGCATAGG 
GTCCCCAGTGTATGCCATGCAGGACAGCAAGGGCCGCCTCCATGCCCTGACCTCTGTTAGCAG 
AGAGCAGATAGTCGGAGGTGATGTGCAGGGCTACAGGTGGATGTTTGAGACACAGCCCCTAGA 
CCAGCTCGGCCGAAGCCCCAGTACCATCGACGTGGTGCGGGGCATCACCCGGCAGGAAGTGGT 
GGCTGGGGACGTTGGCACAGCTCGGTGGCTTTTTGAGACCCAGCCCCTGGAGATGATCCACCA 
ACGGGAGCAGCAGGAACGACAGAAAGAAGAAGGGAAGAGTCAGGGAGACCCCCAGCCTGAGGC 
ACCCCCAAAGGGCGATGTGCAGACCATCCGGTGGTTGTTCGAGACTCTCGAGGGC 


t 


ORF Start: at 2 




jORF Stop: end of sequence 


f " ' ] 


SEQIDNO:76 


606 aa 


MWat 67469.6kD 


|NOV20c, 
: 249257832 Protein 
[Sequence 

! 


TKLMADTQTQVAPTPTMRMATAEDLPLPPPPALEDLPLPPPKESFSKFHQQRQASELRRLYRH 
IHPELRKNLAEAVAEDLAEVLGSEEPTEGDVQCMRWIFENWRLDAIGEHERPAAKEPVLCGDV 
QATSRKFEEGSFANSTDQEPTRPQPGGGDVRAARWLFETKPLDELTGQAKELEATVREPAASG 
DVQGTRMLFETRPLDRLGSRPSLQEQSPLELRSEIQELKGDVKKTVKLFQTEPLCAIQDAEGA 
IHEVKAACREEIQSNAVRSARWLFETRPLDAINQDPSQVRVI RGISLEEGARPDVS ATRWI FE 
TQPLDAIREILVDEKDFQPSPDLIPPGPDVQQQRHLFETRALDTLKGDEEAGAEAPPKEEWP 
GDVRSTLWLFETKPLDAFRDKVQVGHLQRVDPQDGEGHLSSDSSSALPFSQSAPQRDELKGDV 
KTFKNLFETLPLDSIGQGEVLAHGSPSREEGTDSAGQAQGIGSPVYAMQDSKGRLHALTSVSR 
EQIVGGDVQGYRWMFETQPLDQLGRSPSTIDWRGITRQEWAGDVGTARWLFETQPLEMIHQ 
REQQERQKEEGKSQGDPQPEAPPKGDVQTIRWLFETLEG 



162 



# 



WO 03/023002 w ' PCT/US02/28539 



1 

I 


jSEQIDNO: 77 






jl216bp | 


r NOV20d, 
249263153 DNA 
Sequence 


CACCAAGCTTTCAGGAGAGCAGCCCATGGAAGGTTCCCACCAAGGGGCCCCTGAGAGCCCTGA 
C AGTCTG C AAAG AAACCAG AAAG AG CTCC AGGG CCT C CT G AACC AGGTGC AAG C C CTG G AG AA 
GGAGGCCGCAAGCAGTGTGGACGTGCAGGCCCTGCGGAGGCTCTTTGAGGCCGTGCCCCAGCT 
GGG AGGG G CTG CT C CTC AGG CTC CTG CTGCC C ACC AAAAG CC CG AGG CCT C AGTG G AG C AGG C 
CTTTGG GG AGCTG AC ACGGGT C AGC ACGG AAGT TGCT C AACTG AAGG AAC AG AC C TTGG C AAG 

GGCCAGTGCCAGAGGCCATTTCCAGGGACCTCCAAAAGACCACAGTGCCCACAAGATCAGTGT 
CACAGTCAGCAGTAGCGCCAGGCCCAGTGGCTCAGGCCAGGAGGTCAGAGGTCAAACTGCAGT 
CAAGAACCAAGCCAAGGTTGAATGCCACACTGAGGCCCAGAGTCAAGTCAAGATCAGAAATCA 
CACAGAGGCCAGAGGTCACACAGCCTCAACTGCCCCTTCCACCAGGAGGCAGGAGACATCAAG 
AGAGTATTTGTGCCCTCCTCGGGTTTTACCTTCCAGCCGAGATTCTCCCTCCTCCCCAACATT 
TATCTCCATCCAGTCGGCCACAAGGAAGCCTCTAGAGACTCCCAGCTTTAAGGGCAACCCTGA 
TGTCTC AGTG AAAAG C AC AC AACTGG C T CAGG AC ATAGGCCAGG CC CTG CT CC AC C AG AAAGG 
TGTCCAAGACAAAACTGGGAAGAAGGACATCACCCAGTGCTCTGTGCAACCTGAACCTGCCCC 
TCCCTCAGCCAGTCCCCTGCCCAGAGGGTGGCAAAAGAGTGTTCTGGAGCTACAGACGGGGCC 
AGGGAGCTCACAACACTATGGAGCCATGAGAACCGTGACTGAACAGTATGAGGAGGTGGACCA 
GTTTGGGAACACAGTCCTCATGTCTTCCACCACAGTCACCGAGCAGGCAGAGCCACCCAGGAA 
CCCAGGCTCCCACCTCGGGCTCCACGCCTCCCCCTTGCTGAGGCAGTTCCTGCACAGCCCAGC 
TGGGTTCAGCAGTGACCTGACAGAAGCTGAGACGGTGCAGGTGTCCTGCAGCTACTCQCAGCC 
AGCTG C C C AG CTCG AGGG C 




OR F Start: at 2 






ORF Stop: end of sequence 


— - . — >— — ' 1 


SEQ ID NO: 78 


|405 aa 


MWat 43419.8kD 


NOV20d, 
249263153 Protein 
Sequence 

■ 


TKLSGEQPMEGSHQGAPESPDSLQRNQKELQGLLNQVQALEKEAASSVDVQALRRLFEAVPQL 
GGAAPQAPAAHQKPEASVEQAFGELTRVSTEVAQLKEQTLARLLDIEEAVHKALSSMSSLQPE 
ASARGHFQGPPKDHSAHKISVTVSSSARPSGSGQEVRGQTAVKNQAKVECHTEAQSQVKIRNH 
TEARGHTASTAPSTRRQETSREYLCPPRVLPSSRDSPSSPTFISIQSATRKPLETPSFKGNPD 
VSVKSTQLAQDIGQALLHQKGVQDKTGKKDITQCSVQPEPAPPSASPLPRGWQKSVLELQTGP 
GSSQHYGAMRTVTEQYEEVDQFGNTVLMSSTTVTEQAEPPRNPGSHLGLHASPLLRQFLHSPA 
GFSSDLTEAETVQVSCSYSQPAAQLEG 




SEQ ID NO: 79 






] 1 2 1 6 bp 


NOV20e, 
249263166 DNA 
Sequence 

• 
■ 

I 

I 

i 
i 

1 


C AC C AAG CTTTCAGG AG AGC AGCCC ATGG AAGGT TCCCACCAAGGGG CCCCTG AG AGC CCTG A 
CAGTCTGCAAAGAAACCAGAAAGAGCTCCAGGGCCTCCTGAACCAGGTGCAAGCCCTGGAGAA 
GGAGGCCGCAAGCAGTGTGGACGTGCAGGCCCTGCGGAGGCTCTTTGAGGCCGTGCCCCAGCT 
GGG AGGGGCTG CTCCTC AGGC TC CTG CTG CCC AC C AAAAG CC CGAGG CCTC AG TGG AG C AGG C 
CTTTGGGGAGCTGACACGGGTCAGCACGGAAGTTGCTCAACTGAAGGAACAGACCTTGGCAAG 
GCTGCTGGACATTGAAGAGGCTGTGCACAAGGCACTCAGCTCCATGTCTAGCCTCCAGCCTGA 
GGCCAGTGCCAGAGGCCATTTCCAGGGACCTCCAAAAGACCACAGTGCCCACAAGATCAGTGT 
CACAGTCAGCAGTAGCGCCAGGCCCAGTGGCTCAGGCCAGGAGGTCGGAGGTCAAACTGCAGT 
CAAGAACCAAGCCAAGGTTGAATGCCACACTGAGGCCCAGAGTCAAGTCAAGATCAGAAATCA 
CACAGAGGCCAGAGGTCACACAGCCTCAACTGCCCCTTCCACCAGGAGGCAGGAGACATCAAG 
AGAGTATTTGTGCCCTCCTCGGGTTTTACCTTCCAGCCGAGATTCTCCCTCCTCCCCAACATT 
TATCTCCATCCAGTCGGCCACAAGGAAGCCTCTAGAGACTCCCAGCTTTAAGGGCAACCCTGA 
TGTCT C AGTG AAAAG CAC AC AAC TGGCTC AGG AC AT AGG CCAGGCC CTG CTCC ACC AG AAAGG 
TGTCCAAGACAAAACTGGGAAGAAGGACATCACCCAGTGCTCTGTGCAACCTGAACCTGCCCC 
TCCCTCAGCCAGTCCCCTGCCCAGAGGGTGGCAAAAGAGTGTTCTGGAGCTACAGACGGGGCC 
AGGGAGCTCACAACACTATGGAGCCATGAGAACCGTGACTGAACAGTATGAGGAGGTGGACCA 
GTTTGGGAACACAGTCCTCATGTCTTCCACCACAGTCACCGAGCAGGCAGAGCCACCCAGGAA 
CCCAGGCTCCCACCTCGGGCTCCACGCCTCCCCCTTGCTGAGGCAGTTCCTGCACAGCCCAGC 
TGGGTTCAGCAGTGACCTGACAGAAGCTGAGACGGTGCAGGTGTCCTGCAGCTACTCCCAGCC 
AG CTG CC C AG CTCG AGGG C 


r " 


ORF Start: at 2 






ORF Stop: end of sequence 


\ i 


SEQ ID NO: 80 


405 aa 


MWat 43320.7kD 


;NOV20e, 
249263166 Protein 
Sequence 


TKLSGEQPMEGSHQGAPESPDSLQRNQKELQGLLNQVQALEKEAASSVDVQALRRLFEAVPQL 
GG AAPQAPAAHQKPEASVEQAFGELTRVSTEVAQLKEQTLARLLDIEEAVHKALSSMSSLQPE 
ASARGHFQGPPKDHSAHKISVTVSSSARPSGSGQEVGGQTAVKNQAKVECHTEAQSQVKIRNH 
TEARGHTASTAPSTRRQETSREYLCPPRVLPSSRDSPSSPTFISIQSATRKPLETPSFKGNPD 
VSVKSTQLAQDIGQALLHQKGVQDKTGKKDITQCSVQPEPAPPSASPLPRGWQKSVLELQTGP 
GSSQHYGAMRTVTEQYEEVDQFGNTVLMSSTTVTEQAEPPRNPGSHLGLHASPLLRQFLHSPA 
GFSSDLTEAETVQVSCSYSQPAAQLEG 


i 

1 


SEQ ID NO: 81 


1216 bp 



163 



WO 03/023002 




T/US02/28539 



,NOV20f, 

j249263170DNA 

jSequence 


CACCAAGCTTTCAGGAGAGCAGCCCATGGAAGGTTCCCACCAAGGGGCCCCTGAGAGCCCTGA 
CAGTCTGCAAAGAAACCAGAAAGAGCTCCAGGGCCTCCTGAACCAGGTGCAAGCCCTGGAGAA 
GGAGGCCGCAAGCAGTGTGGACGTGCAGGCCCTGCGGAGGCTCTTTGAGGCCGTGCCCCAGCT 
GGGAGGGGCTGCTCCTCAGGCTCCTGCTGCCCACCAAAAGCCCGAGGCCTCAGTGGAGCAGGC 
CTTTGGGGAGCTGACACGGGTCAGCACGGAAGTTGCTCAACTGAAGGAACAGACCTTGGCAAG 
GCTGCTGGACATTGAAGAGGCTGTGCACAAGGCACTCAGCTCCATGTCTAGCCTCCAGCCTGA 
GGCCAGTGCCAGAGGCCATTTCCAGGGACCTCCAAAAGACCACAGTGCCCACAAGATCAGTGT 
CACAGTCAGCAGTAGCGCCAGGCCCAGTGGCTCAGGCCAGGAGGTCGGAGGTCAAACTGTAGT 
CAAGAACCAAGCCAAGGTTGAATGCCACACTGAGGCCCAGAGTCAAGTCAAGATCAGAAATCA 
CACAGAGGCCAGAGGTCACACAGCCTCAACTGCCCCTTCCACCAGGAGGCAGGAGACATCAAG 
AGAGTATTTGTGCCCTCCTCGGGTTTTACCTTCCAGCCGAGATTCTCCCTCCTCCCCAACATT 
TATCTCCATCCAGTCGGCCACAAGGAAGCCTCTAGAGACTCCCAGCTTTAAGGGCAACCCTGA 
TGTCTCAGTGAAAAGCACACAACTGGCTCAGGACATAGGCCAGGCCCTGCTCCACCAGAAAGG 
TGTCCAAGACAAAACTGGGAAGAAGGACATCACCCAGTGCTCTGTGCAACCTGAACCTGCCCC 
TCCCTCAGCCAGTCCCCTGCCCAGAGGGTGGCAAAAGAGTGTTCTGGAGCTACAGACGGGGCC 
AGGGAGCTCACAACACTATGGAGCCATGAGAACCGTGACTGAACAGTATGAGGAGGTGGACCA 
GTTTGGG AAC ACAGTCCT C ATGTCTTC C ACC ACAGT C AC CG AGC AGG C AG AG CC ACC C AGG AA 
LL.CA^Cj^l^ULAUUl^^t5LjUiCCA^OLLlCL.L.Q,v« i I Ot. ICjAUvjLAIj i i VjLALAGCCCAGC 
TGGGTTCAGCAGTGACCTGACAGAAGCTGAGACGGTGCAGGTGTCCTGCAGCTACTCCCAGCC 
AGCTGCCCAGCTCGAGGGC 




ORF Start: at 2 iORF Stop: end of sequence 




SEQ ID NO: 82 |405 aa |M W at 43348.7kD 


NOV20f. 

249263170 Protein 
Sequence 


TKLSGEQPMEGSHQGAPESPDSLQRNQKELQGLLNQVQALEKEAASSVDVQALRRLFEAVPQL 
GGAAPQAPAAHQKPEASVEQAFGELTRVSTEVAQLKEQTLARLLDIEEAVHKALSSMSSLQPE 
ASARGHFQGPPKDHSAHKISVTVSSSARPSGSGQEVGGQTWKNQAKVECHTEAQSQVKIRNH 
TEARGHTASTAPSTRRQETSREYLCPPRVLPSSRDSPSSPTFISIQSATRKPLETPSFKGNPD 
VSVKSTQLAQDIGQALLHQKGVQDKTGKKDITQCSVQPEPAPPSASPLPRGWQKSVLELQTGP 
GSSQHYGAMRTVTEQYEEVDQFGNTVLMSSTTVTEQAEPPRNPGSHLGLHASPLLRQFLHSPA 
GFSSDLTEAETVQVSCSYSQPAAQLEG 




SEQ ID NO: 83 


21 15 bp j 


NOV20g. 

CGI 1 1501-03 DNA 
Sequence 

I 

! 


CAAGAAGGTGTCTGTTGGAGCCAGCAGAACAGAACCAATTTGAACAAGAACCTCCAGAGGAAC 


G ACG AACCCTG AG AC C AC AGCTG CT AC AG ACC AC AAA C ACCC CAT C AGC C AAG AG AG A C CC TT 


GCTGCTGTGACAGGACCTGACTTTCCAGCTGGAGCCCACCGTGCTGAGGACTCCATCCAGCAA 


GCCTCTGAGCCCCTGAAGGACCCCCTTCTTCACTCCCACAGCAGCCCTGCTGGCCAGAGAACC 


CCTGGAGGGTCACAGACAAAGACCCCAAAACTGGACCCCACCATGCCCCCAAAGAAGAAGCCG 


CAGCTGCCCCCTAAACCTGCACACCTAACCCAGAGCCACCCTCCTCAGAGGCTGCCCAAGCCC 
TTGCCTCTATCTCCCAGCTTTTCCTCGGAGGTGGGGCAAAGAGAACACCAACGAGGTGAGAGA 
GATACAGCCATCCCTCAGCCAGCCAAGGTTCCCACTACTGTAGACCGAGGCCACATACCTCTG 
GCCAGATGTCCCAGTGGACATAGCCAGCCCAGCTTACAACATGGCCTCAGCACCACGGCCCCC 
AGGCCCACCAAGAATCAGGCTACAGGCAGCAATGCCCAGAGCTCTGAGCCCCCCAAGCTCAAT 
GCCCTCAACCATGATCCCACCTCACCACAGTGGGGCCCCGGCCCCTCAGGAGAGCAGCCCATG 
GAAGGTTCCCACCAAGGGGCCCCTGAGAGCCCTGACAGTCTGCAAAGAAACCAGAAAGAGCTC 
CAGGGCCTCCTGAACCAGGTGCAAGCCCTGGAGAAGGAGGCCGCAAGCAGTGTGGACGTGCAG 
GCCCTGCGGAGGCTCTTTGAGGCCGTGCCCCAGCTGGGAGGGGCTGCTCCTCAGGCTCCTGCT 
GCCCACCAAAAGCCCGAGGCCTCAGTGGAGCAGGCCTTTGGGGAGCTGACACGGGTCAGCACG 
GAAGTTGCTCAACTGAAGGAACAGACCTTGGCAAGGCTGCTGGACATTGAAGAGGCTGTGCAC 
AAGGCACTCAGCTCCATGTCTAGCCTCCAGCCTGAGGCCAGTGCCAGAGGCCATTTCCAGGGA 
CCTCCAAAAGACCACAGTGCCCACAAGATCAGTGTCACAGTCAGCAGTAGCGCCAGGCCCAGT 
GGCTCAGGCCAGGAGGTCGGAGGTCAAACTGCAGTCAAGAACCAAGCCAAGGTTGAATGCCAC 
ACTGAGGCCCAGAGTCAAGTCAAGATCAGAAATCACACAGAGGCCAGAGGTCACACAGCCTCA 
ACTGCCCCTTCCACCAGGAGGCAGGAGACATCAAGAGAGTATTTGTGCCCTCCTCGGGTTTTA 
CCTTCCAGCCGAGATTCTCCCTCCTCCCCAACATTTATCTCCATCCAGTCGGCCACAAGGAAG 
CCTCTAGAGACTCCCAGCTTTAAGGGCAACCCTGATGTCTCAGTGAAAAGCACACAACTGGCT 
CAGGACATAGGCCAGGCCCTGCTCCACCAGAAAGGTGTCCAAGACAAAACTGGGAAGAAGGAC 
ATCACCCAGTGCTCTGTGCAACCTGAACCTGCCCCTCCCTCAGCCAGTCCCCTGCCCAGAGGG 
TGGCAAAAGAGTGTTCTGGAGCTACAGACGGGGCCAGGGAGCTCACAACACTATGGAGCCATG 
AGAACCGTGACTGAACAGTATGAGGAGGTGGACCAGTTTGGGAACACAGTCCTCATGTCTTCC 
ACCACAGTCACCGAGCAGGCAGAGCCACCCAGGAACCCAGGCTCCCACCTCGGGCTCCACGCC 
TCCCCCTTGCTGAGGCAGTTCCTGCACAGCCCAGCTGGGTTCAGCAGTGACCTGACAGAAGCT 
GAGACGGTGCAGGTGTCCTGCAGCTACTCCCAGCCAGCTGCCCAGTGAGGCCCACCGCCTCCC 
ACCACACCTGCCACCTGTTCCTGGCCTCCACTGCCCCAGGACTGAAGTGGGTACCTGCCTCCT 


GTACACTGGAGCAAGGACCAAGAGGAAATGGCATCTTCAGAGGATTACTGTGGGCCATTTCCC 


TTTCGCAGTTCTTTCAATAGGCCCAGTTCTTCCAAATGGAAAAAGAAAGGTCTGGAAGAGGCC 


CACAGAGTTGCACAGGCGTGGGGGTAGGATGGGGGC 
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i 


ORF Start: ATG at 295 




ORF Stop: TGA at 1873 




SEQ ID NO: 84 (526 aa Jm W at 56300. 1 kD 


|NOV20g. 

jCGI 11501-03 Protein 
Sequence 

i 

1 


MPPKKKPQLPPKPAHLTQSHPPQRLPKPLPLSPSFSSEVGQREHQRGERDTAIPQPAKVPTTV 
DRGHIPLARCPSGHSQPSLQHGLSTTAPRPTKNQATGSNAQSSEPPKLNALNHDPTSPQWGPG 
PSGEQPMEGSHQGAPESPDSLQRNQKELQGLLNQVQALEKEAASSVDVQALRRLFEAVPQLGG 
AAPQAPAAHQKPEASVEQAFGELTRVSTEVAQLKEQTLARLLDIEEAVHKALSSMSSLQPEAS 
ARGHFQGPPKDHSAHKISVTVSSSARPSGSGQEVGGQTAVKNQAKVECHTEAQSQVKIRNHTE 
ARGHTASTAPSTRRQETSREYLCPPRVLPSSRDSPSSPTFISIQSATRKPLETPSFKGNPDVS 
VKSTQLAQDIGQALLHQKGVQDKTGKKDITQCSVQPEPAPPSASPLPRGWQKSVLELQTGPGS 
S Q H YG AMRT VTEQ Y E E VDQ FGNTVLM S S TT VT EQ AE P PRN PG S HLGLHAS PLLRQFLHS PAG F 
S S DLTE AETVQ VS CS YSQ P AAQ 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 20B. 



( l_ **** * ** * .-I-. u ■ - - — ' ' ' 

I Table 20B. Comparison of NOV20a against NOV20b through NOV20g. 



! Protein Sequence 


| NOV20a Residues/ 
{ Match Residues 


Identities/ 

Similarities for the Matched Region 


, : NOV20b 

i 


1..979 
1..979 


893/979 (91%) 
894/979(91%) 


!NOV20c 

i 
l 
1 


I..600 
4..603 


514/600 (85%) 
515/600 (85%) 


i NOV20d 

] 

i 


1300.. 1698 
4..402 


365/399 (91%) 
365/399 (91%) 


| NOV20e 


1300.. 1698 
4..402 


366/399(91%) 
366/399 (91%) 


1 NOV20f 

i 

i 


1300.. 1698 
4..402 


365/399 (91%) 
365/399 (91%) 


j NOV20g 

! 
1 


11 86.. 1698 
14.. 526 


460/513 (89%) 
461/513 (89%) 



Further analysis of the NOV20a protein yielded the following properties shown in 
Table 20C. 



[ Table 20C. Protein Sequence Properties NOV20a 


! PSort 
'analysis: 

\ _ 


0.7000 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


{ SignalP 
| analysis: 


No Known Signal Sequence Predicted 
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A search of the NOV20a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 20D. 

, , — . . — , 

| Table 20D. Geneseq Results for NOV20a 



1 
} 

Geneseq | Protcin/Organism/Length [Patent #, 
Identifier j Date] 


INUVZUa 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


ABP06536 ! Human ORFX protein sequence SEQ 
1 ID NO: 13054 - Homo sapiens, 165 aa. 
j [WO200192523-A2, 06-DEC-2001] 


406 566 
2..162 


160/161 T9Q%^ 
161/161 (99%) 


JC OO 


A AO05743 j Human polypeptide SEQ ID NO 1 9635 
| - Homo sapiens, 132 aa. 
j [WO200164835-A2, 07-SEP-2001] 


202..333 
1 ..132 


124/132 (93%) 
127/132 (95%) 


le-64 


ABP06405 


' Human ORFX protein sequence SEQ 
ID NO: 12792 - Homo sapiens, 135 aa. 
j [WO200192523-A2, 06-DEC-2001] 


1111..1220 
28..I35 


102/110(92%) 
103/110(92%) 


5e-54 


ABP33418 


Human ORF2391 protein, SEQ ID 
NO:4782 - Homo sapiens, 1 13 aa. 
[WO200190366-A2, 29-NOV-200I] 


1299..14I0 
2..I13 


108/112(96%) 
108/112(96%) 


7e-53 


AAB41480 


Human ORFX ORF1244 polypeptide 
sequence SEQ ID NO:2488 - Homo 
sapiens, 1 13 aa. [WO200058473-A2, 
05-OCT-2000] 


1299..14I0 ; 
2..II3 


108/112(96%) 
108/112(96%) 


7e-53 



5 

In a BLAST search of public sequence datbases, the NOV20a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 20E. 



Tabic 20E. Public BLASTP Results for NOV20a 


Protein 

Accession 

Number 


Protcin/Organism/Length 


NOV20a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


070373 


Xin - Mus musculus (Mouse), 1677 
aa. 


1..1698 
1 ..1677 


1069/1734 (61%) 
1217/1734 (69%) 


0.0 


Q8TCG7 


Hypothetical 1 12.1 kDa protein - 
Homo sapiens (Human), 1059 aa 
(fragment). 


568..1327 
4..724 


606/772 (78%) 
628/772 (80%) 


0.0 
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B AC04783 j CDNA FLJ39 1 02 fis, clone 

j NTONG2002948, moderately similar 
j to Mus musculus Xin mRNA - Homo 
: sapiens (Human), 526 aa. 


1 173..1698 
I..526 


525/526 (99%) 
526/526 (99%) 


0.0 


BAC04655 ! CDNA FLJ38622 fis, clone 

j HEART2008364, moderately similar 
| to Mus musculus Xin mRNA - Homo 
| sapiens (Human), 394 aa. 


1305.. 1698 
1..394 


394/394(100%) 
394/394(100%) 


0.0 


Q91957 j XIN - Gallus gallus (Chicken), 2562 
aa. 


20..746 
17..794 


398/793 (50%) 
520/793 (65%) 


0.0 



PFam analysis predicts that the NOV20a protein contains the domains shown in the 
Table 20F. 



Table 20F. Domain Analysis of NOV20a 







Identities/ 




Pfam Domain 


NOV20a Match Region 


Similarities 

for the Matched Region 


Expect Value 



5 

Example 21. 

The NOV21 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 21 A. 



Table 21 A. NOV21 Sequence Analysis 




SEQ ID NO: 85 J 618b P I 


NOV21a, 

CG1 12595-01 DNA 
Sequence 


GCCCTTCCATGCCAGACCTCAGCAAGTGGTCCGGGCCCTTGAGCCTGCAAGAAGTGGACGAGC 
AGCCGCAGCACCCGCTGCATGTCACCTACGCCGGGGCGGCGGTGGACGAGCTGGGCAAAGTGC 
TG ACG CCC AC C C AGG TT AAG AAT AG ACCC AC C AG C AT TT CGTGGG ATGGT CTTG ATT C AG G G A 
AGCTCTACACCTTGGTCCTGACAGACCCGGATGCTCCCAGCAGGAAGGATCCCAAATACAGAG 
AATGGCATCATTTCCTGGTGGTCAACATGAAGGGCAATGACATCAGCAGTGGCACAGTCCTCT 
CCGATTATGTGGGCTCGGGGCCTCCCAAGGGCACAGGCCTCCACCGCTATGTCTGGCTGGTTT 
ACGAGCAGGACAGGCCGCTAAAGTGTGACGAGCCCATCCTCAGCAACCGATCTGGAGACCACC 
GTGGCAAATTCAAGGTGGCGTCCTTCCGTAAAAAGTATGAGCTCAGGGCCCCGGTGGCTGGCA 
CGTGTTACCAGGCCGAGTGGGATGACTATGTGCCCAAACTGTACGAGCAGCTGTCTGGGAAGT 
AGGGGGTTAGCTTGGGGACCTGAACTGTCCTGGAGGCCCCACCACACTCTG 


r 


ORF Start: ATG at 9 [ jORF Stop: TAG at 567 


i 

i 


SEQ ID NO: 86 1 86 aa MW at 20957.5kD 


NOV21a, 

CGI 12595-01 Protein 
Sequence 


MPDLSKWSGPLSLQEVDEQPQHPLHVTYAGAAVDELGKVLTPTQVKNRPTSISWDGLDSGKLY 
TLVLTDPDAPSRKDPKYREWHHFLWNMKGNDISSGTVLSDYVGSGPPKGTGLHRYVWLVYEQ 
DRPLKCDEPILSNRSGDHRGKFKVAS FRKKYELRAPVAGTCYQAEWDDYVPKLYEQLSGK 




SEQ ID NO: 87 Jj434 bp 


NOV21b, 

CGI 12595-02 DNA 
Sequence 


GAGCCAGTGTGCTGAGCTCTCCGCGTCGCCTCTGTCGCCCGCGCCTGGCCTACCGCGGCACTC 


CCGGCTGCACGCTCTGCTTGGCCTCGCCATGCCGGTGGACCTCAGCAAGTGGTCCGGGCCCTT 


GAGCCTGCAAGAAGTGGACGAGCAGCCGCAGCACCCGCTGCATGTCACCTACGCCGGGGCGGC 
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TATAATTTTACTCACTCACTCTGATTTATGTTTTGATCAAATTTGAACTTCATTTTGGGGGGT 



ATTTTGGTACTGTGATGGGGTCATCAAATTATTAATCTGAAAATAGCAACCCAGAATGTAAAA 



AAGAAAAAGCTGGGGGGAAAAAGACCAGGTCTACAGTGATAGAGCAAAGCATCAAAGAATCTT 



GGTGG ACG AGCTGGG C AAAGTG CTG ACG C CC ACCC AG GTT AAG AAT AG AC C CACCAGCATTTC 
GTGGGATGGTCTTGATTCAGGGAAGCTCTACACCTTGGTCCTGACAGACCCGGATGCTCCCAG 
CAGGAAGGATCCCAAATACAGAGAATGGCATCATTTCCTGGTGGTCAACATGAAGGGCAATGA 
CATCAGCAGTGGCACAGTCCTCTCCGATTATGTGGGCTCGGGGCCTCCCAAGGGCACAGGCCT 
CCACCGCTATGTCTGGCTGGTTTACGAGCAGGACAGGCCGCTAAAGTGTGACGAGCCCATCCT 
CAGCAACCGATCTGGAGACCACCGTGGCAAATTCAAGGTGGCGTCCTTCCGTAAAAAGTATGA 
GCTCAGGGCCCCGGTGGCTGGCACGTGTTACCAGGCCGAGTGGGATGACTATGTGCCCAAACT 
GTACGAGCAGCTGTCTGGGAAGTAG GGGGTTAGCTTGGGGACCTGAACTGTCCTGGAGGCCCC 
AAGCCATGTTCCCCAGTTCAGTGTTGCATGTATAATAGATTTCTCCTCTTCCTGCCCCCCTTG 



GCATGGGTGAGACCTGACCAGTCAGATGGTAGTTGAGGGTGACTTTTCCTGCTGCCTGGCCTT 



TAAGGGAGGTTTAAAAAAAAAAAAAAAAAAAAAGATTGGTTGCCTCTGCCTTTGTGATCCTGA 



GTCCAGAATGGTACACAATGTGATTTTATGGTGATGTCACTCACCTAGACAACCAGAGGCTGG 



CATTGAGGCTAACCTCCAACACAGTGCATCTCAGATGCCTCAGTAGGCATCAGTATGTCACTC 



TGGTCCCTTTAAAGAGCAATCCTGGAAGAAGCAGGAGGGAGGGTGGCTTTGCTGTTGTTGGGA 



CATGGCAATCTAGACCGGCAGCAGCGCTCGCTGACAGCTTGGGAGGAAACCTGAGATCTGTGT 



TTTTTAAATTGATCGTTCTTCATGGGGGTAAGAAAAGCTGGTCTGGAGTTGCTGAATGTTGCA 



TTAATTGTGCTGTTTGCTTGTAGTTGAATAAAAATAGAAACCTGAATG 



ORF Start: ATG at 92 



ORF Stop: TAG at 653 



SEQIDNO: 88 



187 aa 



MWat21056.6kD 



|NOV2lb, 

|CG II 2595-02 Protein 
ISequence 



MPVDLSKWSGPLSLQEVDEQPQHPLHVTYAGAAVDELGKVLTPTQVKNRPTS I SWDGLDSGKL 
YTLVLTDPDAPSRKDPKYREWHHFLVVNMKGNDISSGTVLSDYVGSGPPKGTGLHRYVWLVYE 
QDRPLKCDEPILSNRSGDHRGKFKVASFRKKYELRAPVAGTCYQAEWDDYVPKLYEQLSGK 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 2 IB. 



| Table 21B. Comparison of NOV21a against NOV21b. 


i 

! Protein Sequence 

j , 


NOV21a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


;NOV2lb 

L : 


3..186 
4.. 187 


184/184(100%) 
184/184(100%) 



Further analysis of the NOV2 la protein yielded the following properties shown in 
Table 2 IC. 



Table 21C. Protein Sequence Properties NOV21a 



; PSort j 0.4500 probability located in cytoplasm; 0.3603 probability located in microbody 
] analysis: : (peroxisome); 0. 1 000 probability located in mitochondrial matrix space; 0. 1 000 
i j probability located in lysosome (lumen) 



| SignalP 
; analysis: 



No Known Signal Sequence Predicted 
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A search of the NOV2 1 a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 21 D. 



Table 21D. Geneseq Results for NOV21a 


Geneseq 
Identifier 


Protein/Organism/Length (Patent #, 
Date] 


NOV21a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 

i 

1 


AAE21677 


Human phosphoethanolamine binding 

protein (PEBP) - Homo sapiens, 187 
a* ru/oonno i »^9i a? ai mar 

2002] 


3..186 
4..187 


184/184(100%) 
184/184(100%) 


e-109 


AAR49943 


Human hippocampal cholinergic 
neurotrophic peptide precursor - Homo 
sapiens, 187 aa. [WO9405788-A, 17- 
MAR-1994] 


3..186 
4.. 187 


184/184(100%) 
184/184(100%) 


e-109 


AAR27718 


HCNP precursor protein #2 - Homo 
sapiens, 1 87 aa. [EP5 1 1 8 1 6-A, 04- 
NOV-1992] 


3..186 
4.. 187 


184/184 (100%) 
184/184(100%) 

■ 


e-109 


AAR64268 


Phosphatidylethanolamine binding 
protein - Homo sapiens, 187 aa. 
[EP628631-A, 14-DEC-1994] 


3..186 
4.. 187 


183/184 (99%) 
183/184(99%) 


e-109 


AAE2I676 


Mouse phosphoethanolamine binding 
protein (PEBP) - Mus muscuhis, 1 87 
aa. [WO2002 1 8623-A2. 07-MAR- 
2002] 


3..186 
4..187 


160/184 (86%) 
170/184(91%) 


le-95 
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In a BLAST search of public sequence datbases, the NOV21a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 2 1 E. 



Table 21E. Public BLASTP Results for NOV21a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV21a 
Residues/ 
IVIatch 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


AAH31I02 
- * 


Prostatic binding protein - Homo. sapiens 
(Human), 187 aa. 


3..186 
4..187 


184/184 (100%) 
184/184(100%) 


e-109 



169 
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j P30086 

! 

1 


Phosphatidylethanolamine-binding 
protein (PEBP) (Neuropolypeptide h3) 
(Hippocampal cholinergic 
neurostimulating peptide) (HCNP) (Raf 
kinase inhibitor protein) (RKIP) - Homo 
sapiens (Human), 186 aa. 


3..186 
3..I86 


184/184(100%) 
184/184(100%) 


e-109 




phosphatidylethanolamine-binding 
protein - crab-eating macaque, 1 87 aa. 


j.. 1 86 
4..I87 


180/184 (97%) 
181/184 (97%) 


e-106 


P48737 


Phosphatidylethanolamine-binding 
protein (PEBP) - Macaca fascicularis 
(Crab eating macaque) (Cynomolgus 
monkey), 1 86 aa. 


3..I86 
3..186 


180/184 (97%) 
181/184 (97%) 


e-106 


PI 3696 


Phosphatidylethanolamine-binding 
protein (PEBP) (Basic cytosolic 21 kDa 
protein) - Bos taurus (Bovine), 186 aa. 


3..186 
3..186 


173/184 (94%) 
177/184(96%) 


e-102 



PFam analysis predicts that the NOV2 1 a protein contains the domains shown in the 
Table 2 IF. 



| Table 21F. Domain Analysis of NOV21a 


| Pfam Domain 


NOV21a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


PBP 


1 ..171 


91/201 (45%) 
162/201 (81%) 


l.le-84 
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Example 22. 

The NOV22 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 22A. 



jTable 22A. NOV22 Sequence Analysis 


i 


SEQIDNO: 89 


662 bp 


1 


NOV22a, 


CGCAGCCAGCACCGGGCGGAGAGGGCTACCATGGGGAAAATCGCGCTGCAACTCAAAGCCACG 


[CGI 12624-01 DNA 
Sequence 

i 


CTGGAGAACATCACCAACCTCCGGCCCGTGGGCGAGGACTTCCGGTGGTACCTGAAGGACAGT 
GTGGCACTGAAGGGGGGCCGTGGCAGTGCTTCCATGGTCCAGAAGTGCAAGCTGTGTGCAAGA 
GAAAATTCCATCGAGATTTTAAGCAGCACCATCAAGCCTTACAATGCTGAAGACAATGAGAAC 
TTCAAGACAATAGTGGAGTTTGAGTGCCGGGGCCTTGAACCAGTTGATTTCCAGCCGCAGGCT 
GGGTTTGCTGCTGAAGGTGTGGAGTCAGGGACAGCCTTCAGTGACATTAATCTGCAGGAGAAG 
GACTGGACTGACTATGATGAAAAGGCCCAGGAGTCTGTGGGAATCTATGAGGTCACCCACCAG 
TTTGTGAAGTGCTGATCCCTCTTCCTTCCCAGTTGCCCTTAAGAACTGAGAAAGGACAAAGTA 


i 
1 


CTCTAAGCAGCAGAGCCCACAGAGGCTCGTTCCTTTGACCCTTGTCTCCTGGTGGCTATACGA 


i 


AACCTTC ACAATCTG C ATGCTGG ACTTT ATT AC AGCTTC CC AAGC C CC ATC AAT AAAG C CC CT 


i 


GTTCACGCTGCACTGGTGCATGAAGGTGAAAT 




ORF Start: ATG at 31 


|ORFStop:TGAat454 




SEQIDNO: 90 |l41 aa 


MWat 15790.6kD 



170 
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NOV22a, 

CGI 12624-01 Protein 
Sequence 



^mgkialqlkatlenitnlrpvgedfrwylkdsvalkggrgsasmvqkcklcarensieilsstI 

IKPYNAEDNENFKTI VEFECRGLEPVDFQPQAGFAAEGVESGTAFSDINLQEKDWTDYDEKAQ | 
ESVGIYEVTHQFVKC " ! 



Further analysis of the NOV22a protein yielded the following properties shown in 
Table 22B. 



Table 22B. Protein Sequence Properties NOV22a 


PSort 
analysis: 


0.6500 probability located in cytoplasm; 0.1000 probability located in mitochondrial 
matrix space; 0.1000 probability located in lysosome (lumen); 0.0000 probability 
located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV22a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 22C. 



Table 22C. Geneseq Results for NOV22a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV22a 
Residues/ 
Match 
Residues 


Identities/ 

Similarities for j Expect 
the Matched I Value 
Region j 


AAG89328 


Human secreted protein, SEQ ID NO: 
448 - Homo sapiens, 160 aa. 
[WO20014245I-A2, 14-JUN-2001] 


1-141 
1..160 


117/160 (73%) 
121/160 (75%) 


!2e-58 

L 


AAG0I632 


Human secreted protein, SEQ ID NO: 
5713 - Homo sapiens, 107 aa. 
[EP1 03340 1-A2, 06-SEP-2000] 


I..85 
1..104 


85/104(81%) 
85/104(81%) 


4e-40 


ABB6084I 


Drosophila melanogaster polypeptide 
SEQ ID NO 93 1 5 - Drosophila 
melanogaster, 161 aa. [WO20017I042- 
A2.27-SEP-200I] 


1..140 
I..160 


63/160 (39%) 
91/160(56%) 


2e-28 


AAG48660 


Arabidopsis thaliana protein fragment 
SEQ 1 D NO: 6 1 473 - Arabidopsis 
thaliana, 167 aa. [EPI033405-A2, 06- 
SEP-2000] 


I..126 
1-151 


50/152(32%) 
78/152(50%) 


2e-15 


AAG 17798 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 18956 - Arabidopsis 
thaliana, 167 aa. [EP1033405-A2, 06- 
SEP-2000] 


1 ..126 
1 ..151 


50/152 (32%) 
78/152(50%) 


2e-l5 
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In a BLAST search of public sequence datbases, the NOV22a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 22D. 



Table 22D. Public BLASTP Results for NOV22a 


! 

i } 

: r roiein j 

' Accession j Protein/Organism/Length 
: Number j 

! i 


| NOV22a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


j Q9NWV4 j CDNA FLJ20580 fis, clone REC005 1 6 
I j (Similar to hypothetical protein 
• | FLJ20580) - Homo sapiens (Human), 
3l60aa. 


1..I4I 
1 .160 


141/160 (88%) 
141/160 (88%) 


le-74 


Q9DCH5 1 0610037L13Rik protein (RIKEN 
! ] cDNA 0610037L13 gene) - Mus 
j j musculus (Mouse), 1 96 aa. 


3..141 
39.. 196 


132/158(83%) 
136/158(85%) 


6e-70 


• Q9D1H2 ] 1 1 1 0008H 1 6Rik protein - Mus 
! | musculus (Mouse), 168 aa. 


1..133 
1..154 


114/154 (74%) 
122/154 (79%) 


9e-56 

•») 


T22286 ] hypothetical protein F46B6.3 - 
; Caenorhabditis elegans, 491 aa. 


4.. 138 
328..482 


65/155(41%) 
95/155 (60%) 


le-29 


j AAF58406 j CG4646-PA - Drosophila melanogaster 
] (Fruit fly), 163 aa. 


1..140 
1 .160 


63/160(39%) 
91/160 (56%) 


6e-28 



5 PFam analysis predicts that the NOV22a protein contains the domains shown in the 

Table 22E. 

| Table 22E. Domain Analysis of NOV22a 



I 

! Pfam Domain 




Identities/ 




NOV22a Match Region 


Similarities 


Expect Value 


i 




for the Matched Region 




I 



Example 23. 

10 The NOV23 clone was analyzed, and the nucleotide and encoded polypeptide 

sequences are shown in Table 23A. 



/Table 23A. NOV23 Sequence Analysis 


i 
i 


SEQIDNO:91 1019 bp 




:NOV23a, 

ICG 113823-01 DNA 
jSequence 


GCAGTGACTCTGGGAAATCCTTCATTAATCATTCACACCTTCAGGGACATTTAAGAACTCACA 


ATGGAGAAAGTCTCCATGAATGGAAGGAATGTGGGAGAGGCTTTATTCACTCCACAGACCTTG 


CTGTGCGTATACAAACTCACAGGTCAGAAAAACCCTACAAATGTAAGGAATGTGGAAAAGGAT 


TTAGATATTCTGCATACCTTAATATTCACATGGGAACCCACACTGGAGACAATCCCTATGAGT 
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GTAAGGAGTGTGGGAAAGCCTTCACCAGGTCTTGTCAACTTACTCAGCACAGAAAAACTCACA 
CTGGAGAGAAACCTTATAAATGTAAGGATTGTGGGAGAGCCTTCACTGTTTCCTCTTGCTTAA 
GTCAACATATGAAAATCCATGTGGGTGAGAAGCCTTATGAATGCAAGGAATGTGGGATAGCCT 
TCACTAGATCTTCTCAACTTACTGAACATTTAAAAACTCACACTGCAAAGGATCCCTTTGAAT 
GTAAGATATGTGGAAAATCCTTTAGAAATTCCTCATGCCTCAGTGATCACTTTCGAATTCACA 
CTGGAATAAAACCCTATAAATGTAAGGATTGTGGGAAAGCCTTCACTCAGAACTCAGACCTTA 
CTAAGCATGCACGAACTCACAGTGGAGAGAGGCCCTATGAATGTAAGGAATGTGGAAAGGCCT 
TTGCCAGATCCTCTCGCCTTAGTGAACATACAAGAACTCACACTGGAGAGAAGCCTTTTGAAT 
GTGTCAAATGTGGGAAAGCCTTTGCTATTTCTTCAAATCTTAGTGGACATTTGAGAATTCACA 
CTGGAGAGAAGCCCTTTGAGTGCCTGGAATGTGGTAAAGCATTTACGCATTCCTCCAGTCTTA 
ATAATCACATGCGGACCCACAGCGCCAAAAAACCATTCACGTGTATGGAATGTGGCAAAGCCT 
TT AAGTTTCC C ACGTGTG TT AAC CTTC AC ATGCGG ATT CAC ACTGG AG AAAACCCT ACAAT G T 
AACAGTGTGGA 




ORF Start: ATG at 219 


jORF Stop: TAA at 1008 




SEQIDNO:92 


263 aa |MW at 29620.6kD 


NOV23a, 

CGI 13823-01 Protein 
Sequence 


MGTHTGDNPYECKECGKAFTRSCQLTQHRKTHTGEKPYKCKDCGRAFTVSSCLSQHMKIHVGE 
KPYECKECGIAFTRSSQLTEHLKTHTAKDPFECKICGKSFRNSSCLSDHFRIHTGIKPYKCKD 
CGKAFTQNSDLTKHARTHSGERPYECKECGKAFARSSRLSEHTRTHTGEKPFECVKCGKAFAI 
SSNLSGHLRIHTGEKPFECLECGKAFTHSSSLNNHMRTHSAKKPFTCMECGKAFKFPTCVNLH 
MRIHTGENPTM 




SEQ ID NO: 93 


894 bp 




1NOV23D, 

;CG1 13823-02 DNA 
Sequence 


GCCCTTTGTGCGTATACAAACTCACAGGTCAGAAAAACCCTACAAATGTAAGGAATGTGGAAA 


AGGATTTAGATATTCTGCATACCTTAATATTCACATGGGAACCCACACTGGAGACAATCCCTA 


TGAGTGTAAGGAGTGTGGGAAAGCCTTCACCAGGTCTTGTCAACTTACTCAGCACAGAAAAAC 
TCACACTGGAGAGAAACCTTATAAATGTAAGGATTGTGGGAGAGCCTTCACTGTTTCCTCTTG 
CTT AAGTC AACAT AT G AAAAT CC ATG TGGGTG AG AAG C CTT ATGAATG C AAGG AATGTGGG AT 
AG CCTT C ACTAG ATC TTC T CAACTT ACTG AAC AT TT AAAAACTCAC ACTG C AAAG G ATCC CTT 
TG AATGTAAGATATGTGG AAAAT CCTTTAGAAATTCCTCATGCCT CAGTGATC ACT TTCGAAT 
TCACACTGGAATAAAACCCTATAAATGTAAGGATTGTGGGAAAGCCTTCACTCAGAACTCAGA 
CCTTACTAAGCATGCACGAACTCACAGTGGAGAGAGGCCCTATGAATGTAAGGAATGTGGAAA 
GGCCTTTGCCAGATCCTCTCGCCTTAGTGAACATACAAGAACTCACACTGGAGAGAAGCCTTT 
TGAATGTGTCAAATGTGGGAAAGCCTTTGCTATTTCTTCAAATCTTAGTGGACATTTGAGAAT 
TCACACTGGAGAGAAGCCCTTTGAGTGCCTGGAATGTGGTAAAGCATTTACGCATTCCTCCAG 
TCTTAATAATCACATGCGGACCCACAGCGCCAAAAAACCATTCACGTGTATGGAATGTGGCAA 
AGCCTTTAAGTTTCCCACGTGTGTTAACCTTCACATGCGGATCCACACTGGAGAAAACCCTAC 
AATGTAACAGTG 




ORF Start: ATG at 98 




ORF Stop: TAA at 887 




SEQ ID NO: 94 | 


263 aa jMW at 29620.6kD 


|NOV23b, 

CGI 13823-02 Protein 
jSequence 


MGTHTGDNPYECKECGKAFTRSCQLTQHRKTHTGEKPYKCKDCGRAFTVSSCLSQHMKIHVGE 
KPYECKECGIAFTRSSQLTEHLKTHTAKDPFECKICGKSFRNSSCLSDHFRIHTGIKPYKCKD 
CGKAFTQNSDLTKHARTHSGERPYECKECGKAFARSSRLSEHTRTHTGEKPFECVKCGKAFAI 
SSNLSGHLRIHTGEKPFECLECGKAFTHSSSLNNHMRTHSAKKPFTCMECGKAFKFPTCVNLH 
MRIHTGENPTM ! 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 23B. 



Table 23B. Comparison of NOV23a against NOV23b. 



i Protein Sequence 


NOV23a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV23b 


1..263 
1..263 


263/263(100%) 
263/263(100%) 



5 
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Further analysis of the NOV23a protein yielded the following properties shown in 
Table 23C. 



Table 23C. Protein Sequence Properties NOV23a 



PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3008 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



5 • A search of the NOV23a protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 23D. 



Tabic 23D. Geneseq Results for NOV23a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


! NOV23a 
| Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB43912 


Human cancer associated protein 
sequence SEQ ID NO: 13 57 - Homo 
sapiens, 580 aa. [WO200055350-A1, 
21-SEP-2000] 


1 ..261 
262..522 


260/261 (99%) 
260/261 (99%) 


e-164 


AAY73346 


HTRM clone 619699 protein sequence 
- Homo sapiens, 549 aa. 
[W09957144-A2, 1 1 -NOV- 1999] 


1..261 
231. .491 


259/261 (99%) 
260/261 (99%) 


e-164 


AAB94123 


Human protein sequence SEQ ID 
NO: 1 4372 - Homo sapiens, 243 aa. 
[EP1074617-A2, 07-FEB-2001] 


1..232 
1..232 


232/232(100%) 
232/232(100%) 


e-144 


ABG05639 


Novel human diagnostic protein 
#5630 - Homo sapiens. 1 560 aa. 
[WO200175067-A2, 11-OCT-200I] 


4..261 
1173. .1430 


159/258(61%) 
192/258(73%) 


e-100 


ABG0I726 
•. — 


Novel human diagnostic protein 
#1717 - Homo sapiens, 1342 aa. 
[WO200175067-A2, 1 l-OCT-2001] 


4..26I 
1060.. 1317 


152/258(58%) 
199/258(76%) 


le-98 



10 In a BLAST search of public sequence datbases, the NOV23a protein was found to 

have homology to the proteins shown in the BLASTP data in Table 23E. 
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Table 23E. Public BLASTP Results for NOV23a 






i 


Protein 

Accession 

Number 


Protein/Organ ism/Length 


NOV23a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


, , 

Expect 
Value 

I 1 


Q96SX9 


CDNA FLJ 14574 fis, clone 
NT2RM400075 1 , moderately similar to 
zinc finger protein 184 - Homo sapiens 
(Human), 243 aa. 


I..232 
I..232 


232/232(100%) 
232/232(100%) 


e-144 


Q8WVXI 


Hypothetical 29.9 kDa protein - Homo 
sapiens (Human), 263 aa. 


57..261 
I..205 


204/205 (99%) 
204/205 (99%) 


e-125 j 

J 




Similar to unnamed protein product - 
Homo sapiens (Human), 688 aa. 


4..261 
30I..558 


159/258(61%) 
192/258(73%) 


e-100 j 


BAC04086 


CDNA FLJ35863 fis, clone 
TESTI2007524, moderately similar to 
HYPOTHETICAL ZINC FINGER 
PROTEIN - Homo sapiens (Human), 
475 aa. 


4..261 
I92..449 


158/258(61%) 
194/258(74%) 


e-100 | 

i 


Q96MM9 


CDNA FLJ32I33 fis, clone 
PEBLM2000308, moderately similar to 
zinc finger protein 135 - Homo sapiens 
(Human), 338 aa. 


4..261 
3S..292 


155/258(60%) 
192/258(74%) 


le-98 



PFam analysis predicts that the NOV23a protein contains the domains shown in the 
Table 23F. 



Table 23F. Domain Analysis of NOV23a 



Pfam Domain 


NOV23a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


zf-C2H2 


10..32 


1 1/24 (46%) 
22/24 (92%) 


4.4e-06 


zf-C2H2 


38..60 


1 1/24 (46%) 
20/24 (83%) 


7.le-06 


zf-C2H2 


66..88 


12/24 (50%) 
21/24 (88%) 


7.5e-07 


zf-C2H2 


94.. 116 


1 1/24 (46%) 
19/24 (79%) 


8.1e-05 


zf-BED 


79.1 17 


14/52 (27%) 
23/52 (44%) 


0.58 


zf-C2H2 


I22..I44 


14/24 (58%) 
22/24 (92%) 


4.5e-08 
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zf-C2H2 


150.. 172 


13/24 (54%) 
18/24 (75%) 


3.3e-06 


zf-C2H2 


I78..200 


12/24 (50%) 
18/24 (75%) 


0.00013 


zf-BED 


163..20I 


15/52 (29%) 
26/52 (50%) 


0.22 


zf-C2H2 


206..228 


11/24(46%) 
22/24 (92%) 


2e-07 


zf-C2H2 


234..256 


8/24 (33%) 
17/24 (71%) 


0.0034 



Example 24. 

The NOV24 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 24A. 

5 



Table 24A. NOV24 Sequence Analysis 




SEQIDNO:95 


1945 bp 




NOV24a, 

CGI 14098-01 DNA 

Sequence 


G AC AAGCTTC CT AG CGCT ATGAC TGT CGT CT CCGTC C CG C AG CGGG AGCCG CT CG TCC TGG GT 
GGCCGCCTTGCGCCGCTTGGCTTTTCCTCCCGAGGTTACTTTGGGGCCCTCCCGATGGTGACC 
ACGGCTCCGCCTCCTTTACCCCGGATCCCGGACCCCCGGGCACTGCCCCCGACCCTCTTCCTC 
CCTCATTTCCTAGGGGGAGATGGCCCGTGTCTGACCCCCCAGCCTCGCGCTCCAGCAGCTCTG 
CCCAACCGCAGCCTCGCCGTGGCGGGAGGCACTCCTCGGGCAGCGCCGAAGAAGCGGCGAAAG 
AAGAAGGTGCGGGCCAGCCCCGCAGGGCAGCTGCCCAGCCGCTTCCACCAGTACCAGCAGCAC 
CGGAGCCCCGCGACCGGCCCGAGCGGAGCGCAGGAGGTCCCGGGCCCGGCCGCCGCCTTGGCC 
CCGAGTCCTGCAGCCGCAGCCGGCACGGAGGGAGCCAGCCCCGACCTTGCCCCGCTGCGGCCC 
GCGGCTCCCGGCCAAACCCCCCTCAGGAAAGAGGTTTTAAAATCAAAGATGGGAAAATCGGAG 
AAAATTGCCCTTCCCCATGGCCAGCTTGTTCATGGTATACACTTGTATGAGCAACCAAAGATA 
AACAGACAGAAAAGCAAATATAACTTGCCACTAACCAAGATCACCTCTGCAAAAAGAAATGAA 
AACAACTTTTGGCAGGATTCTGTTTCATCTGACAGAATTCAGAAGCAGGAAAAAAAGCCTTTT 
AAAAATACCGAGAACATTAAAAATTCGCATTTGAAGAAATCAGCATTTCTAACTGAAGTGAGC 
CAAAAGGAAAATTATGCTGGGGCAAAGTTTAGTGATCCACCTTCTCCTAGTGTTCTTCCAAAG 
CCTCCTAGTCACTGGATGGGAAGCACTGTTGAAAATTCCAACCAAAACAGGGAGCTGATGGCA 
GTACACTTAAAAACCCTCCTCAAAGTTCAAACTTAGATTTCAGATTTCAGTATGTGTGTAAAA 
C AT AATTTTT CCC AT ATCCCTGG ACT CTTG AG AAAAT TGGT AC AG AAATGG AAATT TG C C T TG 


TTGCAACATACAATTGCAAAAGATGAGTTTAAAAAATTACATACAAACAGCTTGTATTATATT 


TTATATTTTGTAAATACTGTATACCATGTATTATGTGTATATTGTTCATACTTGAGAGGTATA 


TTATAGTTTTGTTATGAAAGTATGTATTTTGCCCTGCCCACATTGCAGGTGTTTTGTATATAT 


ACAATGGATAAATTTTAAGTGTGTGCTAAGGCACATGGAAGACCGATTTTATTTGCACAAGGT 


ACTGAGATTTTTTTCAAGAAACAGCTGTCAAATCTCAAGGTGAAGATCTAAATGTGAACAGTT 


1 


TACTAATGCACTACTGAAGTTTAAATCTGTGGCACAATCAATGTAAGCATGGGGTTTGTTTCT 




CTAAATTGATTTGTAATCTGAAATTACTGAACAACTCCTATTCCCATTTTTGCTAAACTCAAT 


TTCTGGTTTTGGTATATATCCATTCCAGCTTAATGCCTCTAATTTTAATGCCAACAAAATTGG 


TTGTAATCAAATTTTAAAATAATAATAATTTGGCCCCCCCTTTTAAAATAGTCTTGACTCTTT 


GTGTGTGACTGTTTCTCATGTTTGAATGTGTGACTAGGAGATGATTTTGTGTGGTTGGATTTT 


TTTG ACTT CT ACTTT ACTGGCTG AGTGTG AGCCG CCATG CCTGG CC AT AATCT AC ATT T TC TT 


AC CAGG AG C AGC ATTG AGGTT TTTG AG C ATAGT A CTTG ACT ACTCT AG AG G CTG AG ACGGG AG 


CATCTCTTGAGCCTGAGAAGTGGAGATTGCAATTGAGCTAGGATCAGGCCACTGCACTCCAGC 


CTGGG TAAC AGACG C TGTCT C AAAAAAAAGG C C AAG AG AAAGT AAGGG AG AC AG A 




ORF Start: ATG at 19 




ORF Stop: TAG at 979 


|SEQIDNO: 96 |320 aa MW at 34527.4kD 


NOV24a, 

CG 11 4098-01 Protein 


MTWSVPQREPLVLGGRLAPLGFSSRGYFGALPMVTTAPPPLPRIPDPRALPPTLFLPHFLGG 
DGPCLTPQPRAPAALPNRSLAVAGGTPRAAPKKRRKKKVRASPAGQLPSRFHQYQQHRSPATG 
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jSequence 



PSGAQEVPGPAAALAPSPAAAAGTEGASPDLAPLRPAAPGQTPLRKEVLKSKMGKSEKIALPH 
GQLVHGIHLYEQPKINRQKSKYNLPLTKITSAKRNENNFWQDSVSSDRIQKQEKKPFKNTENI 
KNSHLKKSAFLTEVSQKENYAGAKFSDPPSPSVLPKPPSHWMGSTVENSNQNRELMAVHLKTL 
LKVQT 



Further analysis of the NOV24a protein yielded the following properties shown in 
Table 24B. 



Table 24B. Protein Sequence Properties NOV24a 


PSort 
analysis: 


0.7000 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.2265 probability located in lysosome (lumen); 0.1000 probability 
located in mitochondrial matrix space 


SignalP 

analysis: 


No Known Signal Sequence Predicted 



A search of the NOV24a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 24C. 



j Table 24C. Geneseq Results for NOV24a 



! 

| Geneseq 
• Identifier 

j 


Protein/Organism/Length (Patent U, 
Date] 


NOV24a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


IAAY48508 

! 


Human breast tumour-associated 
protein 53 - Homo sapiens, 2 1 1 aa. 
[DE19813835-A1,23-SEP-1999] 


121. .320 
15..211 


183/200(91%) 
185/200 (92%) 


e-101 


AAY48433 

! 

! 


Human prostate cancer-associated 
protein 1 30 - Homo sapiens, 1 53 aa. 
[DEI 981 1 194-A1, 16-SEP-1999] 


168..320 
1.153 


153/153 (100%) 
153/153 (100%) 


3e-85 


1 1 

; AAM47280 

i 


Human DNA topoisomerase 1-15- 

Homo sapiens, 139aa. 

[ WO200 1 8 1 395- A 1,01 -NOV-200 1 ] 


20I..3I9 
2 1 ..138 


50/121 (41%) 
71/121 (58%) 


2e-l6 


IAAB42905 

1 

) 


Human ORFX ORF2669 polypeptide 
sequence SEQ ID NO:5338 - Homo 
sapiens, 139 aa. [WO200058473-A2, 
05-OCT-2000] 


201..319 
2I..I38 


50/121 (41%) 
71/121 (58%) 


2e-16 


;AAW40200 


Infected cell protein number 4 - Herpes 
simplex virus, 1297 aa. [WO9804709- 
A2, 05-FEB-I998] 


40..I65 
670..785 


47/126(37%) 
60/126 (47%) 


8e-06 
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In a BLAST search of public sequence datbases, the NOV24a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 24D. 



Table 24D. Public BLASTP Results for NOV24a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV24a 
Residues/ 
Match 
Residues 


Identities/ 

Similarities for j Expect 
the Matched j Value 
Portion j 


Q 12796 


Proline-rich protein 2 (B4-2 protein) - 
Homo sapiens (Human), 327 aa. 


1..320 
1.327 


320/327 (97%) j 0.0 
320/327 (97%) j 


Q63647 


Proline rich protein - Rattus norvegicus 
(Rat), 269 aa. 


34..320 
1..269 


212/287(73%) Ie-112 
224/287 (77%) j 


Q9NPJ4 


CDNA FLJ20312 FIS, clone HEP07362 
(HSPC208) (Hypothetical 15.6 kDa 
protein) (Proline-rich nuclear receptor 
coactivator 2) - Homo sapiens (Human), 
139aa. 


201..319 
21 ..138 


50/121(41%) !7e-l6 
71/121 (58%) 

| 

i . 


Q9CR73 


061001 IE17Rik protein (RIKEN cDNA 
06 1 00 1 1 E 1 7 gene) - Mus musculus 
(Mouse), 140 aa. 


179..319 
1..139 


57/144 (39%) 
79/144 (54%) 


9e-16 


Q9CXC6 


06100 1 1 El 7Rik protein - Mus musculus 
(Mouse), 140 aa. 


179..319 
1..139 


56/144 (38%) 
79/144 (53%) 


3e-15 ! 

! 



5 PFam analysis predicts that the NOV24a protein contains the domains shown in the 

Table 24E. 



Table 24E. Domain Analysis of NOV24a 



Pfam Domain NOV24a Match Region 



Identities/ 
Similarities 

for the Matched Region 



Expect Value 



Example 25. 

10 The NOV25 clone was analyzed, and the nucleotide and encoded polypeptide 

sequences are shown in Table 25A. 



Table 25A. NOV25 Sequence Analysis 




SEQIDNO:97 


2440 bp j 


NOV25a, 

CGI 14308-01 DNA 


AGTTCTCTGTAGTGTTTGCCAATGTTGGAGCCGTCTGCAAAGTGTCCCCGGCAAGAAGAGGCT 


GCCTACCACAAGGACTTTAGCTTACTTTTTAAAGATTGAAGAAAAAAAAGAAGACAGAAAAAG 
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'Sequence 


AAGAACTCAAAGATACACAAAGTAATTTGAACCAAGGCTCAGAAGTTTTTGGAGCCGTGAGGG 


ATACAGCAGTTTGGTCAATATTGTCTTAACATGCTTCAAATAAATCAGCTTCTCTCCAAGATA 


| 
i 

: 

i 

i 

i 

i 
i 

f 
1 

i 

i 

1 

I 

1 

I 

i 

1 


AAATGGCAAACCCAAAAGAGAAAACTGCAATGTGTCTGGTAAATGAGTTAGCCCGTTTCAATA 
GAGTCCAACCCCAGTATAAACTTCTGAATGAAAGAGGGCCTGCTCATTCAAAGATGTTCTCAG 
TGCAGCTGAGTCTTGGTGAGCAGACATGGGAATCCGAAGGCAGCAGTATAAAGAAGGCTCAGC 
AGGCTGTTGCCAATAAAGCTTTGACTGAATCTACGCTTCCCAAACCAGTTCAGAAGCCACCCA 
AAAGTAATGTTAACAATAACCCAGGCAGTATAACTCCAACTGTGGAACTGAATGGGCTTGCTA 
TG AAAAG GGG AG AGCCTG CC ATCT AC AGGC C ATT AG ATC CAAAGC C ATTC CC AAATT AT AG AG 
CTAATTACAACTTTCGGGGCATGTACAATCAGAGGTATCATTGCCCAGTGCCTAAGATCTTTT 
ATGTTCAGCTCACTGTAGGAAATAATGAATTTTTTGGGGAAGGAAAGACTCGACAAGCTGCTA 
GACACAATGCTGCAATGAAAGCCCTCCAAGCACTGCAGAATGAACCTATTCCAGAAAGATCTC 
CTCAGAATGGTGAATCAGGAAAGGATATGGATGATGACAAAGATGCAAATAAGTCTGAGATCA 
GCTT AGTGTTTGAAATTG CTC TG AAG CG AAAT ATG CCTG TC AGTTTTG AG GT T AT T AAAG AAA 
GTGGACCACCACATATGAAAAGCTTTGTTACTCGAGTGTCAGTAGGAGAGTTCTCTGCAGAAG 
GAG AAGG AAAT AGCAAAAAACTCTCC AAG AAGCG CG CTG CG ACCACCGTCTT AC AGG AG CT T A 
AAAAACTTCCACCTCTTCCTGTGGTGGAAAAGCCAAAACTATTTTTTAAAAAACGCCCTAAAA 
C AAT AGT AAAGGCCGG ACC AG AATATGGC C AAGGG ATG AACCCT ATT AGC CG C C TGG CG C AAA 
TTCAACAGGCCAAAAAGGAAAAGGAGCCGGATTATGTTTTGCTTTCAGAAAGAGGAATGCCTC 
GACGTCGAGAATTTGTGATGCAGGTGAAGGTAGGCAATGAAGTTGCTACAGGAACAGGACCTA 
ATAAAAAGATAGCCAAAAAAAATGCTGCAGAAGCAATGCTGTTACAACTTGGTTATAAAGCAT 
CCACTAATCTTCAGGATCAACTTGAGAAGACAGGGGAAAACAAAGGATGGAGTGGTCCAAAGC 
CTGGGTTTCCTGAACCAACAAATAATACTCCAAAAGGAATTCTTCATTTGTCTCCTGATGTTT 
ATCAAGAGATGGAAGCCAGCCGCCACAAAGTAATCTCTGGCACTACTCTAGGCTATTTGTCAC 
CCAAAGATATGAACCAACCTTCAAGCTCTTTCTTCAGTATATCTCCCACATCGAATAGTTCAG 
CTACAATTGCCAGGGAACTCCTTATGAATGGAACATCTTCTACAGCTGAAGCCATAGGTTTAA 
AAGGAAGTTCTCCTACTCCCCCTTGTTCTCCAGTACAACCTTCAAAACAACTGGAATATTTAG 
CAAGGATTCAAGGCTTTCAGGTTCACTACTGTGATAGACAAAGTGGCAAAGAGTGTGTGACCT 
GTCTGACATTAGCCCCTGTGCAGATGACTTTCCATGCTATTGGAAGCTCCATTGAAGCCAGCC 
ATG ATC AGG C AGC CTT AAGTG C CTTGAAACAATTTT CTG AAC AAGG ACTG G ATC CAATCG ATG 
GAGCAATGAATATCGAAAAAGGTTCTCTTGAAAAACAAGCCAAGCATCTGAGAGAGAAAGCGG 
ACAATAACCAGGCACCCCCGGGCTCCATCGCTCAGGACTGCAAGAAATCAAACTCGGCCGTCT 
AGCAGCTCCCAGAACCCGCGGCTGCCACCGCATCCTTATAAACCTGTCAGCACGCATGAGGGT 


! 

t 

t 
t 

t 


GTCTGTGTTCAGGGAAATGAATGACTAATACCATTATTTGAGTCTTATGTGAAGACAACACTA 


TTCTAACACGAGAGATAATATACATGGTACTGTTTATTCCACTGGGGAAAAATAAACTTTGAG 


CATTCCCTGGACTCGAGATCG ATCT ACTCATTGCCTGAGCGCGAAATTGTCCGGTCGG ACT AA 


ATAAGTAGAAATTGAAAAAGCAATTACTATTAATAAAAAAAAACAAATTACTTATAACCATCT 


TTTCCTCATTTTTCTTATTACTTATATATTTTAAATATTATTTTTC 


J* u 

i 


ORF Start: ATG at 255 J jORF Stop: TAG at 2079 


r - 

j 


SEQ ID NO: 98 ]608 aa M W at 66785.6kD 


:NOV25ar 

;CG 114308-01 Protein 

jSequence 

i 

i 

1 


MANPKEKTAMCLVNELARFNRVQPQYKLLNERGPAHSKMFSVQLSLGEQTWESEGSSIKKAQQ 
AVANKALTESTLPKPVQKPPKSNVNNNPGSITPTVELNGLAMKRGEPAIYRPLDPKPFPNYRA 
NYNFRGMYNQRYHCPVPKIFYVQLTVGNNEFFGEGKTRQAARHNAAMKALQALQNEPIPERSP 
QNGESGKDMDDDKDANKSEISLVFEIALKRNMPVSFEVIKESGPPHMKSFVTRVSVGEFSAEG 
EGNSKKLSKKRAATTVLQELKKLPPLPWEKPKLFFKKRPKTIVKAGPEYGQGMNPISRLAQI 
QQAKKEKEPDYVLLSERGMPRRREFVMQVKVGNEVATGTGPNKKIAKKNAAEAMLLQLGYKAS 
TNLQDQLEKTGENKGWSGPKPGFPEPTNNTPKGILHLSPDVYQEMEASRHKVISGTTLGYLSP 
KDMNQPSSSFFSISPTSNSSATIARELLMNGTSSTAEAIGLKGSSPTPPCSPVQPSKQLEYLA 
RIQGFQVHYCDRQSGKECVTCLTLAPVQMTFHAIGSSIEASHDQAALSALKQFSEQGLDPIDG 
AMN I E KG S L E KQ AKHLRE KADNNQ AP PG S I AQDC K K S N S A V 




SEQ ID NO: 99 1899 bp j 


:NOV25b, 

CGI 14308-02 DNA 
jSequence 

i 


TGTGAGGGATACAGCAGTTTGGTCAATATTGTCTTAACATGCTTCAAATAAATCAGATTGAAG 


AACTCCCTTCAGGATTTCTTGTAGAACAAGTCTGGTGTTGATGAAATCCTTCGGCTTTTGTTT 


GTCTGGGAAAGACTTTATCTCTCCTTCATAATTAAAGGATATTTTCACCAGATATACTATTCT 


AGATGTTCTCAGTGCAGCTGAGTCTTGGTGAGCAGACATGGGAATCCGAAGGCAGCAGTATAA 
AG AAGGCTC AGC AGG CTGTTG CC AAT AAAGCTTTG ACTG AATCTACG CTTC C C AAACC AGT TC 
AGAAGCCACCCAAAAGTAATGTTAACAATAACCCAGGCAGTATAACTCCAACTGTGGAACTGA 
ATGGGCTTGCTATGAAAAGGGGAGAGCCTGCCATCTACAGGCCATTAGATCCAAAGCCATTCC 
CAAATTATAGAGCTAATTACAACTTTCGGGGCATGTACAATCAGAGGTATCATTGCCCAGTGC 
CTAAGATCTTTTATGTTCAGCTCACTGTAGGAAATAATGAATTTTTTGGG'GAAGGAAAGACTC 
GACAAGCTGCTAGACACAATGCTGCAATGAAAGCCCTCCAAGCACTGCAGAATGAACCTATTC 
CAGAAAGATCTCCTCAGAATGGTGAATCAGGAAAGGATATGGATGATGACAAAGATGCAAATA 
AGTCTGAGATCAGCTTAGTGTTTGAAATTGCTCTGAAGCGAAATATGCCCGTCAGTTTTGAGG 
TTATTAAAGAAAGTGGACCACCACATATGAAAAGCTTTGTTACTCGAGTGTCAGTAGGAGAGT 
TCTCTGC AG AAGG AG AAGG AAAT AGCAAAAAACTCTCC AAG AAGCG CGCTGCG ACC AC CGTCT 
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TACAGGAGCTTAAAAAACTTCCACCTCTTCCTGTGGTGGAAAAGCCAAAACTATTTTTTAAAA 
AACGCCCTAAAACAATAGTAAAGGCCGGACCAGAATATGGCCAAGGGATGAACCCTATTAGCC 
GCCTGGCGCAAATTCAACAGGCCAAAAAGGAAAAGGAGCCGGATTATGTTTTGCTTTCAGAAA 
GAGGAATGCCTCGACGTCGAGAATTTGTGATGCAGGTGAAGGTAGGCAATGAAGTTGCTACAG 
GAACAGGACCTAATAAAAAGATAGCCAAAAAAAATGCTGCAGAAGCAATGCTGTTACAACTTG 
GTTATAAAGCATCCACTAATCTTCAGGATCAACTTGAGAAGACAGGGGAAAACAAAGGATGGA 
GTGGTCCAAAGCCTGGGTTTCCTGAACCAACAAATAATACTCCAAAAGGAATTCTTCATTTGT 
CTCCTGATGTTTATCAAGAGATGGAAGCCAGCCGCCACAAAGTAATCTCTGGCACTACTCTAG 
GCTATTTGTCACCCAAAGATATGAACCAACCTTCAAGCTCTTTCTTCAGTATATCTCCCACAT 
CGAATAGTTCAGCTACAATTGCCAGGGAACTCCTTATGAATGGAACATCTTCTACAGCTGAAG 
CCATAGGTTTAAAAGGAAGTTCTCCTACTCCCCCTTGTTCTCCAGTACAACCTTCAAAACAAC 
TGGAATATTTAGCAAGGATTCAAGGCTTTCAGGCAGCCTTAAGTGCCTTGAAACAATTTTCTG 
AACAAGGACTGGATCCAATCGATGGAGCGATGAATATCGAAAAAGGTTCTCTTGAAAAACAAG 
CCAAGCATCTGAGAGAGAAAGCGGACAATAACCAGGCACCCCCGGGCTCCATCGCTCAGGACT 
GCAAGAAATCAAACTCGGCCGTCTAGCAGCTCCCAGAACCCGCGGCTGCCACCGCATCCTTAT 




AAACCTGTCAGCACGCATGAGGGTGTCTGTGTTCAGGGAAATGAATGACTAATACCAAAGGGC 




GAAATACTG 




ORF Start: ATG at 192 




ORF Stop: TAG at 1788 




SEQIDNO: 100 


532 aa 


MWat58273.9kD 


NOV25b, 

!CG1 14308-02 Protein 
iSequence 

1 

! 
! 

1 


MFSVQLSLGEQTWESEGSSIKJCAQQAVANKALTESTLPKPVQKPPKSNVNNNPGSITPTVELN 
GLAMKRGEPAIYRPLDPKPFPNYRANYNFRGMYNQRYHCPVPKIFYVQLTVGNNEFFGEGKTR 
QAARHNAAMKALQALQNEPIPERSPQNGESGKDMDDDKDANKSEISLVFEIALKRNMPVSFEV 
IKESGPPHMKSFVTRVSVGEFSAEGEGNSKKLSKKRAATTVLQELKKLPPLPWEKPKLFFKK 
RPKTIVKAGPEYGQGMNPISRLAQIQQAKKEKEPDYVLLSERGMPRRREFVMQVKVGNEVATG 
TGPNKKIAKKNAAEAMLLQLGYKASTNLQDQLEKTGENKGWSGPKPGFPEPTNNTPKGILHLS 
PDVYQEMEASRHKVISGTTLGYLSPKDMNQPSSSFFSISPTSNSSATIARELLMNGTSSTAEA 
IGLKGSSPTPPCSPVQPSKQLEYLARIQGFQAALSALKQFSEQGLDPIDGAMNIEKGSLEKQA 
KHLREKADNNQAPPGSIAQDCKKSNSAV 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 25B. 



\ "■■ ■ 

Table 25B. Comparison of NOV25a against NOV25b. 


Protein Sequence 


NOV25a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV25b 


39..608 
1..532 


485/570 (85%) 
485/570 (85%) 



5 

Further analysis of the NOV25a protein yielded the following properties shown in 
Table 25C. 



| Table 25C. Protein Sequence Properties NOV25a 


PSort 
analysis: 


0.7000 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 
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A search of the NOV25a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 25D. 



! Table 25D. Geneseq Results for NOV25a 



r 

1 Geneseq 
j Identifier 

i 


Protein/Organism/Length [Patents, 
DateJ 


i 

i>iu vzsa 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
i Matched Region 


Expect 
Value 


; AAB93660 

i 

! 


Human protein sequence SEQ ID 
NO: 1 3 1 79 - Homo sapiens, 570 aa. 
[EP1074617-A2, 07-FEB-2001] 


I..608 
I..570 


570/608 (93%) 
570/608 (93%) 


0.0 


AAM409I4 

j 


Human polypeptide SEQ ID NO 5845 

- Homo sapiens, 61 5 aa. 

[ WO200 1 533 1 2- A 1 , 26-JUL-200 1 ] 


I..548 
13.560 


537/548 (97%) 
542/548 (97%) 


0.0 


. AAM39I28 


Human polypeptide SEQ ID NO 2273 

- Homo sapiens, 479 aa. 

[ WO200 1 533 1 2- A 1 , 26-JUL-200 1 ] 


37..51I 
5..479 


473/475 (99%) 
475/475 (99%) 


0.0 


j AAY83023 


Human staufen protein - Homo 
sapiens, 577 aa. [CA2238656-A1, 22- 
NOV-1999] 


39..559 
1..555 


273/576 (47%) 
346/576 (59%) 


e-1 17 


AAB94533 


Human protein sequence SEQ ID 
NO: 1 5268 - Homo sapiens, 206 aa. 
[EP 1 0746 1 7-A2, 07-FEB-200 1 ] 


306..5II 
1 ..206 


206/206(100%) 
206/206(100%) 


e-1 15 



5 



In a BLAST search of public sequence datbases, the NOV25a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 25E. 



Table 25E. Public BLASTP Results for NOV25a 



i 

Protein J 

j Accession ! Protein/Organism/Length 
J Number ! 

i 


NOV25a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


! 

Expect 
Value 


. Q9NUL3 

i 

i 

i . 


CDNA FLJM290 fls, clone 
PLACE1 009622, weakly similar to 
maternal effect protein STAUFEN - 
Homo sapiens (Human), 570 aa. 


1..608 
1..570 


570/608 (93%) 
570/608 (93%) 


0.0 


Q8R4D0 

. ; 


RNA-binding protein staufen splice 
form A - Rattus norvegicus (Rat), 571 
aa. 


I..608 
I..571 


532/609 (87%) 
550/609 (89%) 


0.0 
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Q96HM1 


Staufen (Drosophila. RNA-binding 
protein) homolog 2 - Homo sapiens 
(Human). 538 aa. 


37..608 
5..538 


532/572 (93%) 
534/572 (93%) 


0.0 


Q8R4C9 


RNA-binding protein staufen splice 
form B - Rattus norvegicus (Rat), 512 
aa. 


1..51 1 
1 ..512 


485/512 (94%) 
499/512(96%) 


0.0 


Q9UGG6 


39k3 protein - Homo sapiens (Human), 
479 aa. 


37..511 
5..479 


473/475 (99%) 
475/475 (99%) 


0.0 



PFam analysis predicts that the NOV25a protein contains the domains shown in the 
Table 25F. 



Table 25F. Domain Analysis of NOV25a 


Pfam Domain 


NOV25a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


dsrm 


9..73 


26/74 (35%) 
53/74 (72%) 


2.9c- 10 


dsrm 


1 18-179 


15/74 (20%) 
41/74 (55%) 


0.006 


dsrm 


208..272 


26/74 (35%) 
52/74 (70%) 


!.3e-17 


dsrm 


308..373 


27/74 (36%) 
56/74 (76%) 


1.2e-2l 


dsrm 


496..5S7 


13/74(18%) 
44/74 (59%) 


0.06 



5 

Example 26. 

The NOV26 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 26A. 



Table 26A. NOV26 Sequence Analysis 




SEQ ID NO: 101 J22 1 8 bp ] 


NOV26a, 

CGI 14349-01 DNA 
Sequence 


CCTCGCCGCATCCACTCTCCGGCCGGCCGCCTGCCCGCCGCCTCCTCCGTGCGCCCGCCAGCC 


TCGCCCGCGCCGTCACCATGAGCCAGGCCTACTCGTCCAGCCAGCGCGTGTCCTCCTACCGCC 
GCACCTTCGGCGGCGCCCCGGGCTTCCCGCTCGGCTCCCCGCTGAGCTCGCCCGTGTTCCCGC 
GGGCGGGTTTCGGCTCTAAGGGCTCCTCCAGCTCGGTGACGTCCCGCGTGTACCAGGTGTCGC 
GCACGTCGGGCGGGGCCGGGGGCCTGGGGTCGCTGCGGGCCAGCCGGCTGGGGACCACCCGCA 
CG CCCTCCTC CTACGGCG C AG G CG AG CTG CTGG ACTT CTC ACTGG C CG ACG CGGTG AACC AGG 
AGTTTCTGACCACGCGCACCAACGAGAAGGTGGAGCTGCAGGAGCTCAATGACCGCTTCGCCA 
ACTAC AT CG AG AAGGTGCG CT TCCTGG AG CAG C AG AACG CG CTCG CCG CC G AAGTG AA CCG G C 
TCAAGGGCCGCGAGCCGACGCGAGTGGCCGAGCTCTACGAGGAGGAGCTGCGGGAGCTGCGGC 
GCCAGGTGGAGGTGCTCACTAACCAGCGCGCGCGCGTCGACGTCGAGCGCGACAACCTGCTCG 
ACGACCTGCAGCGGCTCAAGGCCAAGCTGCAGGAGGAGATTCAGTTGAAGGAAGAAGCAGAGA 
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i 


ACAATTTGGCTGCCTTCCGAGCGGACGTGGATGCAGCTACTCTAGCTCGCATTGACCTGGAGC 
GCAGAATTGAATCTCTCAACGAGGAGATCGCGTTCCTTAAGAAAGTGCATGAAGAGGAGATCC 
GTGAGTTGCAGGCTCAGCTTCAGGAACAGCAGGTCCAGGTGGAGATGGACATGTCTAAGCCAG 
ACCTCACTGCCGCCCTCAGGGATATCCGGGCTCAGTATGAGACCATCGCGGCTAAGAACATTT 
CTGAAGCTGAGGAGTGGTACAAGTCGAAGGTGTCAGACCTGACCCAGGCAGCCAACAAGAACA 
ACGACGCCCTGCGCCAGGCCAAGCAGGAGATGATGGAATACCGACACCAGATCCAGTCCTACA 
CCTGCGAGATTGACGCCCTCAAGGGCACTAACGATTCCCTGATGAGGCAGATGCGGGAATTGG 
AGGACCGATTTGCCAGTGAGGCCAGTGGCTACCAGGACAACATTGCGCGCCTGGAGGAAGAAA 
T C CGG C ACCT C AAGG ATG AG ATGG C CCGCC ATCTG CG CG AGTAC C AGG AC C TG CT C AACGTG A 
AGATGGCCCTGGATGTGGAGATTGCCACCTACCGGAAGCTGCTGGAGGGAGAGGAGAGCCGGA 
TCAATCTCCCCATCCAGACCTACTCTGCCCTCAACTTCCGAGAAACCAGCCCTGAGCAAAGGG 
GTTCTGAGGTCCATACCAAGAAGACGGTGATGATCAAGACCATCGAGACACGGGATGGGGAGG 
TCGTCAGTGAGGCGACACAGCAGCAGCATGAAGTGCTCTAAAGACGAGAGACCCTCTGCCACC 
AGAGACCGTCCTCACCCCTGTCCTCACTGCTCCCTGAAGCCCAGCCTTCTTCCATCCCAGGAC 


ACCACACCCAGCCTCAGTCCTCCCGTCACAGCCTCTGACCCCTCCTCACTGGCCATCCCTCGT 


GGTCCCCAACAGCGACATAGCCCATCCCTGCCTGGTCACAGGCATGCCCCGGCCACCTCTGCG 


GACCCCAGCTGTGAGCCTTGGCTGTTGGCAGTGAGTGAGCCTGGCTCTTGTGCTGGATGGAGC 


CCAGGCGGGAGCGGTGGCCCTGTCCCTCCCACCTCTGTGACCTGAGGCCTACGCTTTGGCTCT 


GGAGATAGCCCCAGAGCAGGGTGTTGGGATACTGCAGGGCCAGGACTGAGCCCCGCAGACCTC 


CCCAGCCCCTAGCCCAGGAGAGAGAAAGCCAGGCAGGTAGCCTGGGGGACTAGCCCTGTGGAG 


ACTGGGGGGCTTGAAATTGTCCCCGTGGTCTCTTACTTTCCTTTCCCCAGCCCAGGGTGGACT 


TAGAAAGCAGGGGCTACAAGAGGGAATCCCCGAAGGTGCTGGAGGTGGGAGCAGGAGATTGAG 


AAGGAGAGAAAGTGGGTGAGATGCTGGAGAAGAGAGAGGAGGAGAGAGGCAGAGAGCGGTCTG 


AGGCTGGTGGGAGGGGCGCCCACCTCCCCACGCCCTCCCCCCCCCTGCTGCAGGGGCTCTGGA 


GAGAAACAATAAA 




ORF Start: ATG at 81 


ORf Stop:TAA at 1488 




SEQ ID NO: 1 02 469 aa M W at 53464. 1 kD 


NOV26a, 

CGI 14349-01 Protein 
Sequence 


MSQAYSSSQRVSSYRRTFGGAPGFPLGSPLSSPVFPRAGFGSKGSSSSVTSRVYQVSRTSGGA 
GGLGSLRASRLGTTRTPSSYGAGELLDFSLADAVNQEFLTTRTNEKVELQELNDRFANYIEKV 
RFLEQQNALAAEVNRLKGREPTRVAELYEEELRELRRQVEVLTNQRARVDVERDNLLDDLQRL 
KAKLQEEIQLKEEAENNLAAFRADVDAATLARIDLERRIESLNEEIAFLKKVHEEEIRELQAQ 
LQEQQVQVEMDMSKPDLTAALRDIRAQYETIAAKNISEAEEWYKSKVSDLTQAANKNNDALRQ 
AKQEMMEYRHQIQSYTCEIDALKGTNDSLMRQMRELEDRFASEASGYQDNIARLEEEIRHLKD 
EMARHLREYQDLLNVKMALDVEIATYRKLLEGEESRINLPIQTYSALNFRETSPEQRGSEVHT 
KKTVMIKTIETRDGEWSEATQQQHEVL 




SEQ ID NO: 103 1865 bp 




>NOV26b, 

|CG1 14349-02 DNA 
[Sequence 

i 

j 

1 

i 


CCTCGCCGCATCCACTCTCCGGCCGGCCGCCTGCCCGCCGCCTCCTCCGTGCGCCCGCCAGCC 


TCGCCCGCGCCGTCACCATGAGCCAGGCCTACTCGTCCAGCCAGCGCGTGTCCTCCTACCGCC J 

GCACCTTCGGCGGCGCCCCGGGCTTCCCGCTCGGCTCCCCGCTGAGCTCGCCCGTGTTCCCGC 

GGGCGGGTTTCGGCTCTAAGGGCTCCTCCAGCTCGGTGACGTCCCGCGTGTACCAGGTGTCGC 

GCACGTCGGGCGGGGCCGGGGGCCTGGGGTCGCTGCGGGCCAGCCGGCTGGGGACCACCCGCA 

CGCCCTCCTCCTACGGCGCAGGCGAGCTGCTGGACTTCTCACTGGCCGACGCGGTGAACCAGG 

AG TTT CTG AC C ACG CGC ACC AACG AG AAGGTGG AGCT GC AG G AGCTC AATG AC CG C T T CG CC A 

ACTACATCGAGAAGGTGCGCTTCCTGGAGCAGCAGAACGCGCTCGCCGCCGAAGTGAACCGGC 

TCAAGGGCCGCGAGCCGACGCGAGTGGCCGAGCTCTACGAGGAGGAGCTGCGGGAGCTGCGGC 

GCCAGGTGGAGGTGCTCACTAACCAGCGCGCGCGCGTCGACGTCGAGCGCGACAACCTGCTCG 

ACGACCTGCAGCGGCTCAAGGCCAAGCTGCAGGAGGAGATTCAGTTGAAGGAAGAAGCAGAGA 

ACAATTTGGCTGCCTTCCGAGCGGACGTGGATGCAGCTACTCTAGCTCGCATTGACCTGGAGC 

GCAGAATTGAATCTCTCAACGAGGAGATCGCGTTCCTTAAGAAAGTGCATGAAGAGGAGATCC 

GTGAGTTGCAGGCTCAGCTTCAGGAAATCCGGCACCT CAAGG ATG AG ATGGCCCGCC ATCTG C 

GCGAGTACCAGGACCTGCTCAACGTGAAGATGGCCCTGGATGTGGAGATTGCCACCTACCGGA 

AGCTGCTGGAGGGAGAGGAGAGCCGGATCAATCTCCCCATCCAGACCTACTCTGCCCTCAACT 

TCCGAGAAACCAGCCCTGAGCAAAGGGGTTCTGAGGTCCATACCAAGAAGACGGTGATGATCA 

AGACCATCGAGACACGGGATGGGGAGGTCGTCAGTGAGGCCACACAGCAGCAGCATGAAGTGC 

TCTAAAGACAGAGACCCTCTGCCACCAGAGACCGTCCTCACCCCTGTCCTCACTGCTCCCTGA 


AGCCAGCCTTCTTCCATCCCAGGACACCACACCCAGCCTCAGTCCTCCCCTCACAGCCTCTGA 




CCCCTCCTCACTGGCCATCCCTCGTGGTCCCCAACAGCGACATAGCCCATCCCTGCCTGGTCA 


CAGGCATGCCCCGGCCACCTCTGCGGACCCCAGCTGTGAGCCTTGGCTGTTGGCAGTGAGTGA 




GCCTGGCTCTTGTGCTGGATGGAGCCCAGGCGGGAGCGGTGGCCCTGTCCCTCCCACCTCTGT 




GACCTGAGGCCTACGCTTTGGCTCTGGAGATAGCCCCAGAGCAGGGTGTTGGGATACTGCAGG 


GCCAGGACTGAGCCCCGCAGACCTCCCCAGCCCCTAGCCCAGGAGAGAGAAAGCCAGGCAGGT 




AGCCTGGGGGACTAGCCCTGTGGAGACTGGGGGGCTTGAAATTGTCCCCGTGGTCTCTTACTT 




TCCTTTCCCCAGCCCAGGGTGGACTTAGAAAGCAGGGGCTACAAGAGGGAATCCCCGAAGGTG 




CTGGAGGTGGGAGCAGGAGATTGAGAAGGAGAGAAAGTGGGTGAGATGCTGGAGAAGAGAGAG 
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i 


GAGGAGAGAGGCAGAGAGCGGTCTGAGGCTGGTGGGAGGGGCGCCCACCTCCCCACGCCCTCC 


CCCCCCCTGCTGCAGGGGCTCTGGAGAGAAACAATAAA 


1 

i 


ORF Start: Aj6 at 81 1 |ORF Stop: TAA at 1 1 37 


I 


SEQ!DNO:104 352aa MW at 399l3.2kD 


;NOV26b } 

iCG II 4349-02 Protein 
jSequence 


MSQAYSSSQRVSSYRRTFGGAPGFPLGSPLSSPVFPRAGFGSKGSSSSVTSRVYQVSRTSGGA 
GGLGS LRASRLGTTRTPSS YGAG ELLDFS LADAVNQE FLTTRTNE KVELQELNDR FANY I E KV 
RFLEQQNALAAEVNRLKGREPTRVAELYEEELRELRRQVEVLTNQRARVDVERDNLLDDLQRL 
KAKLQEEIQLKEEAENNLAAFRADVDAATLARIDLERRIESLNEEIAFLKKVHEEEIRELQAQ 
LQEIRHLKDEMARHLREYQDLLNVKMALDVEIATYRKLLEGEESRINLPIQTYSALNFRETSP 
EQRGSEVHTKKTVMIKTIETRDGEWSEATQQQHEVL 




SEQIDNO: 105 |1291 bp | 


jNOV26c, 

:CG II 4349-03 DNA 
Sequence 

i 
t 

i 

i 

! 

i 
i 


ATGAGCCAGGCCTACTCGTCCAGCCAGCGCGTGTCCTCCTACCGCCGCACCTTCGGCGGGGCC 
CCGGGCTTCCCGCTCGGCTCCCCGCTGAGCTCGCCCGTGTTCCCGCGGGCGGGTTTCGGCTCT 
AAGGGCTCCTCCAGCTCGGTGACGTCCCGCGTGTACCAGGTGTCGCGCACGTCGGGCGGGGCC 
GGGGGCCTGGGGTCGCTGCGGGCCAGCCGGCTGGGGACCACCCGCACGCCCTCCTCCTACGGC 
GCAGGCGAGCTGCTGGACTTCTCACTGGCCGACGCGGTGAACCAGGAGTTTCTGACCACGCGC 
ACCAACGAGAAGGTGGAGCTGCAGGAGCTCAATGACCGCTTCGCCAACTACATCGAGAAGGTG 
CGCTTCCTGG AGC AG C AG AACGCGG CGCT CG CCG C CG AAGTG AAC CGGCT C AAGG G C C G CG AG 
CCGACGCGAGTGGCCGAGCTCTACGAGAAGGAAGAGAACAATTTGGCTGCCTTCCGAGCGGAC 
GTGGATGCAGCTACTCTAGCTCGCATTGACCTGGAGCGCAGAATTGAATCTCTCAACGAGGAG 
ATCGCGTTCCTTAAGAAAGTGCATGAAGAGGAGATCCGTGAGTTGCAGGCTCAGCTTCAGGAA 
CAGCAGGTCCAGGTGGAGATGGACATGTCTAAGCCAGACCTCACTGCCGCCCTCAGGGACATC 
CGGGCTCAGTATGAGACCATCGCGGCTAAGAACATTTCTGAAGCTGAGGAGTGGTACAAGTCG 
AAGGTGT C AG ACCTG ACC C AGG C AGCC AAC AAG AACAACG ACG CCCTGCG CC AGG CC AAG C AG 
GAGATGATGGAATACCGACACCAGATCCAGTCCTACACCTGCGAGATTGACGCCCTGAAGGGC 
ACTAACGATTCCCTGATGAGGCAGATGCGGGAATTGGAGGACCGATTTGCCAGTGAGGCCAGT 
GGCTACCAGGACAACATTGCGCGCCTGGAGGAGGAAATCCGGCACCTCAAGGATGAGATGGCC 
CGCCATCTGCGCGAGTACCAGGACCTGCTCAACGTGAAGATGGCCCTGGATGTGGAGATTGCC 
ACCTACCGGAAGCTGCTGGAGGGAGAGGAGAGCCGGATCAATCTCCCCATCCAGACCTACTCT 
GC C CT CAACTTCCG AG AAACC AG CC CTGAG C AAAG GGGTTCTG AGGT CC AT AC C AAG AAG ACG 
GTGATGATCAAGACCATGGAGACACGGGATGGGGAGGTCGTCAGTGAGGCCACACAGCAGCAG 
CATGAAGTGCTCTAAAGACGAGAGACCCTCT 


i 
! 

! 


ORF Start: ATG at 1 j jORF Stop: TAA at 1273 


SEQ ID NO: 106 ^424 aa |mW at 47999.6kD 


|NOV26c ? 

: CG 114349-03 Protein 
jSequence 

i 

I 1 


MSQAYSSSQRVSSYRRTFGGAPGFPLGSPLSSPVFPRAGFGSKGSSSSVTSRVYQVSRTSGGA 
GGLGS LRASRIX3TTRTPSSYGAGELLDFSLADAVNQEFLTTRTNEKVELQELNDR FANY I EKV 
RFLEQQNAALAAEVNRLKGREPTRVAELYEKEENNLAAFRADVDAATLARIDLERRIESLNEE 
IAFLKKVHEEEIRELQAQLQEQQVQVEMDMSKPDLTAALRDIRAQYETIAAKNISEAEEWYKS 
KVSDLTQAANKNNDALRQAKQEMMEYRHQIQSYTCEIDALKGTNDSLMRQMRELEDRFASEAS 
GYQDNIARLEEEIRHLKDEMARHLREYQDLLNVKMALDVEIATYRKLLEGEESRINLPIQTYS 
ALNFRETSPEQRGSEVHTKKTVMIKTMETRDGEWSEATQQQHEVL 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 26B. 



[Table 26B. Comparison of NOV26a against NOV26b and NOV26c. 



1 Protein Sequence 


NOV26a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV26b 


77..255 
77..2S5 


159/179 (88%) 
159/179 (88%) 


NOV26c 


77..469 
77..424 


326/394 (82%) 
328/394 (82%) 



5 
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Further analysis of the NOV26a protein yielded the following properties shown in 
Table 26C. 



■ Table 26C Protein Sequence Properties NOV26a 



;PSort 
1 analysis: 



0.5102 probability located in mitochondrial matrix space; 0.3000 probability located 
in microbody (peroxisome); 0.2347 probability located in mitochondrial inner 
membrane; 0.2347 probability located in mitochondrial intermembrane space 



'SignalP 
|analysis: 



No Known Signal Sequence Predicted 



5 A search of the NOV26a protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 26D, 



Table 26D. Geneseq Results for NOV26a 


1 Geneseq 
Identifier 


Protein/Organism/Length (Patent #, 
Datej 


NOV26a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAY92335 

» 


Human vimentin - Homo sapiens, 466 
aa. [WO200020448-A2, 13-APR-2000] 


12..469 
9..46S 


291/464 (62%) 
365/464 (77%) 


e-157 


AAB66348 

i 


Human vimentin - Homo sapiens, 466 
aa. [EP1067142-A1, 10-JAN-200I] 


12..469 
9..465 


291/464 (62%) 
364/464 (77%) 


e-156 


AAW54351 " 


Vimentin - Homo sapiens, 465 aa. 
[WO9810291-A1, I2-MAR-I998] 


12.. 469 
8..464 


291/464(62%) 
364/464 (77%) 


e-156 


AAU87694 


Human pancreatic tumour protein #6 - 
Homo sapiens, 466 aa. [WO2002 12331- 
A2, 14-FEB-2002] 


12. .469 
9..465 


290/464 (62%) 
363/464 (77%) 


e-156 


AAB29635 

1 
I 

i 


Human pollinosis-associated gene 795- 
encoded protein, SEQ ID NO:26 - 
Homo sapiens, 466 aa. [WO200065050- 
A1,02-NOV-2000] 


12. .469 
9..465 


290/464 (62%) 
363/464 (77%) 


e-156 



10 In a BLAST search of public sequence datbases, the NOV26a protein was found to 

have homology to the proteins shown in the BLASTP data in Table 26E. 
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■ Table 26E. Public BLASTP Results for NOV26a 



j 

Protein 
! Accession 
. Number 

j 


Protein/Organism/Length 


NOV26a L , 

Residues/ ;^ en ,tl * s/ , , 
Match |^nilan,esfbrthe 
Residues | Matched Portion 


i 1 

Expect 
Value 

! 


jAAH32116 


Desmin - Homo sapiens 
(Human), 470 aa. 


I..469 
I..470 


469/470 (99%) 
469/470 (99%) 


0.0 


| Q9UHN5 


Mutant desmin - Homo sapiens 
(Human), 470 aa. 


1..469 
1..470 


468/470 (99%) 
469/470 (99%) 


0.0 


AAC39939 


Mutant desmin - Homo sapiens 
(Human), 470 aa. 


I. .469 j 468/470 (99%) 
1 ..470 1 468/470 (99%) 


0.0 


AAC39938 


Mutant desmin - Homo sapiens 
(Human), 470 aa. 


I ..469 1468/470(99%) 
1..470 1 468/470 (99%) 


0.0 


Q8TD99 


Mutant desmin - Homo sapiens 
(Human), 470 aa. 


1 ..469 j 468/470 (99%) 
1..470 j 468/470 (99%) 


0.0 



PFam analysis predicts that the NOV26a protein contains the domains shown i;] the 
Table 26F. 



Table 26F. Domain Analysis of NOV26a 


Pfam Domain 


NOV26a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


filament 


107..414 


183/359(51%) 
292/359(81%) 


4.8e-l71 


DUF164 


203..421 


40/247 (16%) 
117/247(47%) 


0.45 
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Example 27. 

The NOV27 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 27A. 



jTable 27A. NOV27 Sequence Analysis 


1 , . 


SEQ ID NO: 107 


1903 bp 


jNOV27a, 

iCG 114503-01 DNA 
jSequence 


CGGGGGCGGGGCCGGTAGTGGGAGTGCGGGGCGCGCGGTGACAGCGCGGGGTTGGCGGCGTGG 


GACCCAGGGGGCGACAGAGGCAGCAGCAGCCCGAGGCCTGAGGAGAGGAGACCGGCGGCGGCG 


GCAATGCTGGAGACCTTTCGCGAGCGGCTGCTGAGCGTGCAGCAGGATTTCACCTCCGGGCTG 
AAGACTTTAAGTGACAAGTCAAGAGAAGCAAAAGTGAAAAGCAAACCCAGGTATGAGGATACA 
TGGGCTGCACTTCACAGAAGAGCCAAAGACTGTGCAAGTGCTGGAGAGCTGGTGGATAGCGAG 
GTGGTCATGCTTTCTGCGCACTGGGAGAAGAAAAAGACAAGCCTCGTGGAGCTGCAAGAGCAG 
CTCCAGCAGCTCCCAGCTTTAATCGCAGACTTAGAATCCATGACAGCAAATCTGACTCATTTA 
GAGGCGAGTTTTGAGGAGGTAGAGAACAACCTGCTGCATCTGGAAGACTTATGTGGGCAGTGT 
GAATTAGAAAGATGCAAACATATGCAGTCCCAGCAACTGGAGAATTACAAGAAAAATAAGAGG 
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AAGGAACTTGAAACCTTCAAAGCTGAACTAGATGCAGAGCACGCCCAGAAGGTCCTGGAAATG 
GAGCACACCCAGCAAATGAAGCTGAAGGAGCGGCAGAAGTTTTTTGAGGAAGCCTTCCAGCAG 
GACATGGAGCAGTACCTGTCCACTGGCTACCTGCAGATTGCAGAGCGGCGAGAGCCCATAGGC 
AGCATGTCATCCATGGAAGTGAACGTGGACATGCTGGAGCAGATGGTCCTGATGGACATATCG 
GACCAGGAGGCCCTGGACGTCTTCCTGAACTCTGGAGGAGAAGAGAACACTGTGCTGTCCCCC 
G CCTT AGGT AGGG TTG AC AAACTTGC AT TAG CTG AAC C AGGGCAGT ATCG ATG C C ACT C CC CT 
CCAAAGGTGAGACGTGAGAACCATCTGCCAGTCACTTACGCATAAACCCCCAAGCTCACAGCC 
AGCTCCTGGCTCCCTAACCCCACGGTTCCACACGGCTGTGTGGCAGCTGCAACAGTGGTGTGG 




TTC CGTC ATG AAT TC TT CTC AAAG ATTTG ACATGC TCCACTCCGGT AACTTTGG TG AGT TG AG 




AGCTTTCTTGTTTGTTTTCCCTCCTTTACCATCCAGAAATCCATTTGAGTCTGCTCCTTGTGG 




TTAAGGACTGGCGTTTGCAGGGAGGTGCGGACTCTCCTGCGGGGCTCACGGGAAACTCTTCCC 




TCTTCGTGCGACAGGCATTTAGGGGCGTGCCTGCCATGGGCAAAGCCATGGTGTGTGTTCAGC 




TCTTGGCCTGTGTTGTAAACTTAGTTGCACTTCAGTTCCTTTCATCCCTTCACAAAATTTTGT 




TTCACATTCATGCAGCAAATATGGGCTGAGGTGCCAGACCTGTACCTGGGCTTGGTGCGTTTC 




AAATTT C AG ACC AGTTCTTTGGG CTGGGTC AAGGC AAAG CTC AGT CG TC CC AG C AGC AC CT C A 




GCCATCTGTAGAAGGTTCTACCATTACCACGGTTTCAGCTTCCTCTAAACTTCTCACCCGCTT 




CTCCTGGCAATCTGTCAG AACGGTGT CAT CCTGGGGAAGAGAAGG AG CTTGGGTGC ATTTG CC 




CTCATCCTGAGAAGGCCAGAATACTGGAGACCAGCGTGAACCCTCACCCAGAGTCAGGGGAAG 




ATTT AG AAACAGTG AC ACCTG C ATAT AG AAT TTTG ATTC CTTG AAG AG C CT AT TT AGT TC C AT 




AAAATTGGAGAACTGCTGAAGGTCAGTAATTCCGACTTTCTCAGCAGTGGTGTCTCTGAATTA 




CTGCAAAGGGTAAAAAAAAAAAAAAAAAAAACTTATCGATACCGTCGACCTCGATGATGATGA 




TGATGATGTCGAC 




ORF Start: ATG at 130 


j 




|ORF Stop:TAA at 988 




SEQ ID NO: 108 


286 


aa MW at 32975. lkD 


NOV27a, 

CGI 14503-01 Protein 
Sequence 


MLETFRERLLSVQQDFTSGLKTLSDKSREAKVKSKPRYEDTWAALHRRAKDCASAGELVDSEV 
VMLSAHWEKKKTSLVELQEQLQQLPALIADLESMTANLTHLEASFEEVENNLLHLEDLCGQCE 
LERCKHMQSQQLENYKKNKRKELETFKAELDAEHAQKVLEMEHTQQMKLKERQKFFEEAFQQD 
MEQYLSTGYLQIAERREPIGSMSSMEVNVDMLEQMVLMDISDQEALDVFLNSGGEENTVLSPA 
LGRVDKLALAEPGQYRCHSPPKVRRENHLPVTYA 




SEQ ID NO: 109 




966 bp 




NOV27b, 

CGI 14503-02 DNA 
Sequence ■ 


CCGAGGCCTGAGGAGAGGAGACCGGCGGCGGCGGCAATGCTGGAGACCCTTCGCGAGCGGCTG 


CTGAGCGTGCAGCAGGATTTCACCTCCGGGCTGAAGACTTTAAGTGACAAGTCAAGAGAAGCA 

AAAGTGAAAAGCAAACCCAGGACTGTTCCATTTTTGCCAAAGTACTCTGCTGGATTAGAATTA 

CTTAGCAGGTATGAGGATACATGGGCTGCACTTCACAGAAGAGCCAAAGACTGTGCAAGTGCT 

GGAGAGCTGGTGGATAGCGAGGTGGTCATGCTTTCTGCGCACTGGGAGAAGAAAAAGACAAGC 

CTCGTGGAGCTGCAAGAGCAGCTCCAGCAGCTCCCAGCTTTAATCGCAGACTTAGAATCCATG 

ACAGCAAATCTGACTCATTTAGAGGCGAGTTTTGAGGAGGTAGAGAACAACCTGCTGCATCTG | 

G AAG ACTT ATG TGGG C AG TGTG AATT AG AAAG AT G CAAAC AT ATG C AGTC C C AG C AACTGG AG 

AATTACAAGAAAAATAAGAGGAAGGAACTTGAAACCTTCAAAGCTGAACTAGATGCAGAGCAC 

GCCCAGAAGGTCCTGGAAATGGAGCACACCCAGCAAATGAAGCTGAAGGAGCGGCAGAAGTTT 

TTTGAGGAAGCCTTCCAGCAGGACATGGAGCAGTACCTGTCCACTGGCTACCTGCAGATTGCA 

G AGCGG CG AG AGCCC AT AGGC AG C ATGTC ATCC ATGG AAGTG AACG TGG AC AT G CTG G AGC AG 

ATGG ACCTG ATGG AC AT AT CGG ACC AGG AGGC C CTGG ACGTCTTC CTT AAC T C TGG AG GAG AA 

GAGAACACTGTGCTGTCCCCCGCCTTAGGTAGGGTTGACAAACTTGCATTAGCTGAACCAGGG 

CAGTATCGATGCCACTCCCCTCCAAAGGTGAGACGTGAGAACCATCTGCCAGTCACTTACGCA 

TAAACCCCCAAGCTCACAGCC 




ORF Start: ATG at 37 




ORF Stop: TAA at 946 




SEQ ID NO: 110 ]303 


aa ]MWat34830.2kD 


NOV27b, 

CG1 14503-02 Protein 
Sequence 


MLETLRERLLSVQQDFTSGLKTLSDKSREAKVKSKPRTVPFLPKYSAGLELLSRYEDTWAALH 
RRAKDCASAGELVDSEWMLSAHWEKKKTSLVELQEQLQQLPALIADLESMTANLTHLEASFE 
EVENNLLHLEDLCGQCELERCKHMQSQQLENYKKNKRKELETFKAELDAEHAQKVLEMEHTQQ 
MKLKERQKFFEEAFQQDMEQYLSTGYLQIAERREPIGSMSSMEVNVDMLEQMDLMDISDQEAL 
DVFLNSGGEENTVLSPALGRVDKLALAEPGQYRCHSPPKVRRENHLPVTYA 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 27B. 
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Table 27B. Comparison of NOV27a against NOV27b. 


Protein Sequence 


NOV27a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV27b 


I..286 
1..303 


273/303 (90%) 
273/303 (90%) 



Further analysis of the NOV27a protein yielded the following properties shown in 
Table 27C. 



Table 27C Protein Sequence Properties NOV27a 


PSort 
analysis: 


0.4283 probability located in mitochondrial matrix space; 0.3000 probability located 
in nucleus; 0.1067 probability located in mitochondrial inner membrane; 0.1067 
probability located in mitochondrial intermembrane space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 
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A search of the NOV27a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 27D. 



Table 27D. Gcncseq Results for NOV27a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV27a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


1 

! 

Expect 
Value 

1 


AAM79272 


Human protein SEQ ID NO 1934 - 
Homo sapiens, 35 1 aa. 
[WO200157190-A2, 09-AUG-2001] 


1..254 
1..271 


252/271 (92%) 
252/271 (92%) 


e-137 


ABG07626 


Novel human diagnostic protein #7617 

- Homo sapiens, 301 aa. 

[WO2001 75067- A2, ll-OCT-2001] 


I..236 
30..282 


233/253 (92%) 
234/253 (92%) 


e-126 


AAB43397 


Human ORFX ORF3 161 polypeptide 
sequence SEQ ID NO:6322 - Homo 
sapiens, 196aa. [WO200058473-A2, 
05-OCT-2000] 


154..254 
I5..116 


99/102 (97%) 
100/102 (97%) 


3e-48 


AAM42664 


Human kidney related polypeptide SEQ 
ID NO 533 - Homo sapiens, 101 aa. 
[WO2001 55323- A2, 02-AUG-200I] 


161. .254 
6..99 


93/94 (98%) 
93/94 (98%) 


9e-46 


AAM99849 


Human excretory related polypeptide 
SEQ ID NO 586 - Homo sapiens, 101 
aa. [WO2001553 13-A2, 02-AUG-2001] 


161. .254 
6..99 


93/94 (98%) 
93/94 (98%) 


9e-46 
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In a BLAST search of public sequence datbases, the NOV27a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 27E. 



Table 27E. Public BLASTP Results for NOV27a 


Protein 

Accession 

Number 


j NOV27a 

n , ■ m „ 3 Residues/ 
Protein/Organ ism/Length j jy|^ tcn 

] Residues 


Identities/ 

Similarities for j Expect 
the Matched j Value 
Portion 


Q9H0U2 


Hypothetical 34.8 kDa protein - 
Homo sapiens (Human), 303 aa. 


1..286 
1..303 


285/303 (94%) 
285/303 (94%) 


: e-158 


Q96EV8 


Unknown (protein for MGC:20210) 
(Dysbindin) - Homo sapiens 
(Human), 351 aa. 


1..254 
I..271 


252/271 (92%) 
252/271 (92%) 


e-137 


Q91WZ8 


Dysbindin (Dystrobrevin binding 
protein 1) - Mus musculus (Mouse), 
352 aa. 


1..253 
1..270 


217/270 (80%) 
232/270 (85%) 


e-i!8 


Q96NV2 


CDNA FLJ3003I fis, clone 
3NB692001 349 - Homo sapiens 
(Human), 270 aa. 


65..254 
1 .190 


189/190 (99%) 
189/190 (99%) 


e-103 


Q9D3I4 


5430437B1 8Rik protein - Mus 
musculus (Mouse), 271 aa. 


65..253 
1.189 


158/189(83%) 
172/189 (90%) 


3e-86 



PFam analysis predicts that the NOV27a protein contains the domains shown in the 



Table 27F. 



Table 27F. Domain Analysis of NOV27a 







Identities/ 




Pfam Domain 


NOV27a Match Region 


Similarities 

for the Matched Region 


Expect Value 



10 Example 28. 

The NOV28 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 28A. 



Table 28A. NOV28 Sequence Analysis 




SEQIDNO: 111 807 bp 


NOV28a, 

CGI 14588-01 DNA 


CCGAGGCAAGCTCACCTACACCCAGCAGCTGGAGGACCTCAAGAGGCAGCTGGAGGAGGAGGT 


GGCCCAGTGGAGGACCAAGTATGAGACGGACGCCATTCAGCGGACTGAGGAGCTCGAGGAGGC 
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-Sequence 


CAAGAAGAAGCTGGCCCAGCGGCTGGAGGAAGCTGAGAAGGCAGCAGATGAGAGTGAGAGAGG 


("* A.TV"! A 11 A CITC A TTr A C AfTf^f A CZ.CC*C^ A A A ft A ATT* A fiTA A AI\ R A T'/~ , /~" a A A fT'f** ROP ap* t^**/"" a 

ACTGAAAGAGGCAAAGCACATTGCTGAAGATGCCGACCGCAAATATGAAGAGGTGGCCCGTAA 
G CTGG TC ATC ATTG AG AG CG ACCTGG AACGT G C AG AG G AGCGGG CTG AGC T CT C AG AAG G C AA 
ATGTGCCGAGCTTGAAGAAGAATTGAAAACTGTGACGAACAACTTGAAGTCACTGGAGGCTCA 

CZC1CTCZ ArtAAfiTACTPf!PAGAAfJGAAf;APArtATATf" APP A Art HP ATP A firflTPfTTTrrr A r* a n 

GCTGAAGGAGGCTGAGACTCGGGCTGAGTTTGCGGAGAGGTCAGTAACTAAATTGGAGAAAAG 
CATTGATGACTTAGAAGATAAGTTTCTTTGCTTCACTTCTCCCAAGACTCCCTCGTCGAGCTG 
GATGTCCCACCTCTCTGAGCTCTGCATTTGTCTATTCTCCAGCTGACCCTGGTTCTCTCTCTT 
AGCATCCTGCCTTAGAGCCAGGCACACACTGTGCTTTCTATTGTACAGAAGCTCTTCGTTTCA 


GTGTCAAATAAACACTGTGTAAGCTAAAAAAAJVAAAAAAJ^UU^ 


i 


ORF Start: ATG at 191 jORF Stop: TGA at 674 


i 


SEQIDNO: 112 161 aa jMWat 18682.7kD 


NOV28a, 

CGI 14588-01 Protein 
Sequence 


MKVIESRAQKDEEKMBIQEIQLKEAKHIAEDADRKYEEVARKLVIIESDLERAEERAELSEGK 
CAELEEELKTVTNNLKSLEAQAEKYSQKEDRYEEEIKVLSDKLKEAETRAEFAERSVTKLEKS 
IDDLEDKFLCFTSPKTPSSSWMSHLSELCICLFSS 




SEQIDNO: 113 


1693 bp 


•NOV28b, 

|CG 11 4588-02 DNA 
jSequence 

1 

i 

i 


GACTTCCGGACTGCTCCTGGCCGCAGGGGGCGCCGCCGTCGCACAGAGAGGCCTGGGCGGGGC 
GGACCGGCGCTGGGCAGCCAGGACAGCCGCGGTAGCCGGGTCCGCAGGGCAGCAGCCGGCCTC 
TCCCACTGCAGCCCTCCCGCCCGCCTACCGTCCGGCGCGATGGCGGGGAGTAGCTCGCTGGAG 
GCGGTGCGCAGGAAGATCCGGAGCCTGCAGGAGCAGGCGGACGCCGCTGAGGAGCGCGCGGGC 
ACCCTGCAGCGCGAGCTGGACCACGAGAGGAAGCTGAGGGAGACCGCTGAAGCCGACGTAGCT 
TCTCTGAACAGACGCATCCAGCTGGTTGAGGAAGAGTTGGATCGTGCCCAGGAGCGTCTGGCA 
ACAGCTTTGCAGAAGCTGGAGGAAGCTGAGAAGGCAGCAGATGAGAGTGAGAGAGGCATGAAA 
GTCATTGAGAGTCGAGCCCAAAAAGATGAAGAAAAAATGGAAATTCAGGAGATCCAACTGAAA 
G AGGCNAAGC ACATTG CTGAAG ATGC CG ACCGC AAAT AT G AAG AGGTGGC C CG T AAG CTGGT C 
ATCATTGAGAGCGACCTGGAACGTGCAGAGGAGCGGGCTGAGCTCTCAGAAGGCCAAGTCCGA 
CAGCTGGAAGAACAATTAAGAATAATGGATCAGACCTTGAAAGCATTAATGGCTGCAGAGGAT 
AAGTACTCGCAGAAGGAAGACAGATATGAGGAAGAGATCAAGGTCCTTTCCGACAAGCTGAAG 
GAG^CTGAGACTCGGGCTGAGTTTGCGGAGAGGTCAGTAACTAAATTGGAGAAAAGCATTGAT 


G A CTT AG AAG AG AAAGTGGCTC ATG C C AAAG AAG AAAAC CTT AGT ATGCAT C AG ATG CTGG AT 


C AG AC TTT ACTGG AGTTAAACAACATG TG AAAAC CT C CT TAG CTG CG ACC AC ATTCTTT C ATT 


TTGTTTTGTTTTGTTTTGTTTTTAAACACCTGCTTACCCCTTAAATGCAATTTATTTACTTTT 


ACCACTGTCACAGAAACATCCACAAGATACCAGCTAGGTCAGGGGGTGGGGAAAACACATACA 


AAAAGGCAAGCCCATGTCAGGGCGATCCTGGTTCAAATGTGCCATTTCCCGGGTTGATGCTGC 


C A C ACTTTGT AG AGAGTTT AGC AAC ACAG TG TG CTT AGT C AG CGT AGG AAT C C T C AC T AAAG C 


AGGAGAAGTTCCATTCAAAGTGCCAATGATAGAGTCAACAGGAAGGTTAATGTTGGAAACACA 


i 
i 


ATCAGGTGTGGATTGGTGCTACTTTGAACAAAAGGTCCCCCTGTGGTCTTTTGTTCAACATTG 


TACAATGTAGAACTCTGTCCAACACTAATTTATTTTGTCTTGAGTTTTACTACAAGATGAGAC 


TATGGATCCCGCATGCCTGAATTCACTAAAGCCAAGGGTCTGTAAGCCACGCTGCTCTTCNGA 


GACTTCCATTCCTTTCTGATTGGCACACGTGCAGCTCATGACAATCTGTAGGATAACAATCAG 


TGTGGATTTCCACTCTTTTCAGTCCTTCATGTTAAAGATTTAGACACCACATACAACTGGTAA 


AGG ACGTTTT CTTG AG AGTTTT AACT AT ATG T AAAC ATTGT AT AATG ATATGG AAT AAAATG C 


ACATTGTAGGACATTTTCTAAAAAAAAAAAAAAAGGGNGGCCGCNCTAGNGGATT 


i 


ORF Start: at 1 


ORF Stop: at 745 


i 


SEQIDNO: 114 248 aa (MW at 28746.8kD 


[NOV28b, 

|CG 114588-02 Protein 
•Sequence 


MAGSSSLEAVRRKIRSLQEQADAAEERAGTLQRELDHERKLRETAEADVASLNRRIQLVEEEL 
DRAQERLATALQKLEEAEKAADESERGMKVIESRAQKDEEKMEIQEIQLKEAKHIAEDADRKY 
EEVARKLVIIESDLERAEERAELSEGQVRQLEEQLRIMDQTLKALMAAEDKYSQKEDRYEEEI 
KVLSDKLKEAETRAEFAERSVTKLEKSIDDLEEKVAHAKEENLSMHQMLDQTLLELNNM 




SEQIDNO: 115 


1 1 1 1 bp ) 


!NOV28c, 


CCGCGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGCCCCTCGCCGCCGCCACCATGGACG 


CGI 14588-03 DNA 
Sequence 


CCATCAAGAAGAAGATGCAGATGCTGAAGCTCGACAAGGAGAACGCCTTGGATCGAGCTGAGC 
AGGCGGAGGCCGACAAGAAGGCGGCGGAAGACAGGAGCAAGCAGCTGGAAGATGAGCTGGTGT 
CACTGCAAAAGAAACTCAAGGGCACCGAAGATGAACTGGACAAATACTCTGAGGCTCTCAAAG 
ATGCCCAGGAGAAGCTGGAGCTGGCAGAGAAAAAGGCCACCGATGCTGAAGCCGACGTAGCTT 
CTCTGAACAGACGCATCCAGCTGGTTGAGGAAGAGTTGGATCGTGCCCAGGAGCGTCTGGCAA 
CAGCTTTGCAGAAGCTGGAGGAAGCTGAGAAGGCAGCAGATGAGAGTGAGAGAGGCATGAAAG 
TCATTGAGAGTCGAGCCCAAAAAGATGAAGAAAAAATGGAAATTCAGGAGATCCAACTGAAAG 
AGGCCAAGCACATTGCTGAAGATGCCGACCGCAAATACGAAGAGGTGGCCCGTAAGCTGGTCA 
TC ATTG AG AG CG ACCTGG AACGTGCAG AGG AGCGGG CTG AG CTCTC AG AAGGCAAATGTGCCG 
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AGCTTGAAGAAGAATTGAAAACTGTGACGAACAACTTGAAGTCACTGGAGGCTCAGGCTGAGA 
AGTACTCGCAGAAGGAAGACAGATATGAGGAAGAGATCAAGGTCCTTTCCGACAAGCTGAAGG 
AGGCTGAGACTCGGGCTGAGTTTGCGGAGAGGTCAGTAACTAAATTGGAGAAAAGCATTGATG 
ACTTAGAAGACGAGCTGTACGCTCAGAAACTGAAGTACAAAGCCATCAGCGAGGAGCTGGACC 
ACGCTCTCAACGATATGACTTCCATATAA GTTTCTTTGCTTCACTTCTCCCAAGACTCCCTCG 
TCGAGCTGGATGTCCCACCTCTCTGAGCTCTGCATTTGTCTATTCTCCAGCTGACCCTGGTTC 



TCTCTCTTAGCATCCTGCCTTAGAGCCAGGCACACACTGTGCTTTCTATTGTACAGAAGCTCT 



TCGTTTCAGTGTCAAATAAACACTGTGTAAGCTAAAAAAA 



ORF Start: ATG at 57 



SEQIDNO: 116 



jORF Stop: TAA at 909 



284 aa 



MW at 32708. lkD 



NOV28c, 

CGI 14588-03 Protein 
Sequence 



MDAIKKKMQMLKLDKENALDRAEQAEADKKAAEDRSKQLEDELVSLQKKLKGTEDELDKYSEA 
LKDAQEKLELAEKKATDAEADVASLNRRIQLVEEELDRAQERLATALQKLEEAEKAADESERG 
MKVIESRAQKDEEKMEIQEIQLKEAKHIAEDADRKYEEVARKLVIIESDLERAEERAELSEGK 
CAELEEELKTVTNNLKSLEAQ7VEKYSQKEDRYEEEIKVLSDKLKEAETRAEFAERSVTKLEKS 
IDDLEDELYAQKLKYKAISEELDHALNDMTSI 



SEQIDNO: 117 



756 bp 



NOV28d, 
CGI 14588-04 
Sequence 



CCGCGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGCCCCTCGCCGCCGCCACCATGGACG 



DNA 



CCATCAAGAAGAAGATGCAGATGCTGAAGCTCGACAAGGAGAACGCCTTGGATCGAGCTGAGC 
AGGCGGAGGCCGACAAGAAGGCGGCGGAAGACAGGAGCAAGCAGCTGGAAGATGAGCTGGTGT 
CACTGCAAAAGAAACTCAAGGGCACCGAAGATGAACTGGACAAATACTCTGAGGCTCTCAAAG 
ATGCCCAGGAGAAGCTGGAGCTGGCAGAGAAAAAGGCCACCGATGCTGAAGCCGACGTAGCTT 
CTCTGAACAGACGCATCCAGCTGGTTGAGGAAGAGTTGGATCGTGCCCAGGAGCGTCTGGCAA 
CAGCTTTGCAGAAGCTGGAGGAAGCTGAGAAGGCAGCAGATGAGAGTGAGAGAGGTCAGTAAC 
TAAATTGGAGAAAAGCATTGATGACTTAGAAGACGAGCTGTACGCTCAGAAACTGAAGTACAA 



AGCCATCAGCGAGGAGCTGGACCACGCTCTCAACGATATGACTTCCATATAAGTTTCTTTGCT 



TCACTTCTCCCAAGACTCCCTCGTCGAGCTGGATGTCCCACCTCTCTGAGCTCTGCATTTGTC 



TATTCTCCAGCTGACCCTGGTTCTCTCTCTTAGCATCCTGCCTTAGAGCCAGGCACACACTGT 



G C TT T CT AT TGTAC AG AAGCTC TTCGTTT C AGTG TC AAAT AAAC ACTGTGT AAG C T AAAAAAA 



ORF Start: ATG at 57 
SEQIDNO: 118 




ORF Stop: TAA at 438 
MWat 14414.9kD~ 



NOV28d, 

CGI 14588-04 Protein 
Sequence^ 



MDAIKKKMQMLKLDKENALDRAEQAEADKKAAEDRSKQLEDELVSLQKKLKGTEDELDKYSEA 
LKDAQEKLELAEKKATDAEADVASLNRRIQLVEEELDRAQERLATALQKLEEAEKAADESERG 
Q 



SEQIDNO: 119 



1684 bp 



NOV28e, 
CGI 14588-05 
Sequence 



CGCTCCTCCGCCCGACCGCGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCC 



DNA 



GCCGCCACCATGGACGCCATCAAGAAGAAGATGCAGATGCTGAAGCTCGACAAGGAGAACGCC 



TT GG ATCG AG CTG AG C AGGCGG AG G CCG AC AAG AAGG CGGCGG AAG AC AGG AG C AAG C AG C T C 
GAGGAGGACATCGCGGCCAAGGAGAAGTTGCTGCGGGTGTCGGAGGACGAGCGGGACCGGGTG 
CTGGAGGAGCTGCACAAGGCGGAGGACAGCCTCCTGGCCGCCGAAGAGGCCGCCGCCAAGGCT 
GAAGCCGACGTAGCTTCTCTGAACAGACGCATCCAGCTGGTTGAGGAAGAGTTGGATCGTGCC 
CAGGAGCGTCTGGCAACAGCTTTGCAGAAGCTGGAGGAAGCTGAGAAGGCAGCAGATGAGAGT 
GAGAGAGGCATGAAAGTCATTGAGAGTCGAGCCCAAAAAGATGAAGAAAAAATGGAAATTCAG 
GAGATCCAACTGAAAGAGGCCAAGCACATTGCTGAAGATGCCGACCGCAAATATGAAGAGGTG 
GC CCGT AAGCTGG TC ATC ATTG AG AG CG ACCTGGAACGTGC AG AGG AGCGGG CTG AG CTCTC A 
GAAGGCAAATGTGCCGAGCTTGAAGAAGAATTGAAAACTGTGACGAACAACTTGAAGTCACTG 
GAGGCTCAGGCTGAGAAGTACTCGCAGAAGGAAGACAGATATGAGGAAGAGATCAAGGTCCTT 
TCCGACAAGCTGAAGGAGGCTGAGACTCGGGCTGAGTTTGCGGAGAGGTCAGTAACTAAATTG 
GAGAAAAGCATTGATGACTTAGAAGAGAAAGTGGCTCATGCCAAAGAAGAAAACCTTAGTATG 
CATCAGATGCTGGATCAGACTTTACTGGAGTTAAACAACATGTGA AAACCTCCTTAGCTGCGA 
CCACATTCTTTCGTTTTGTTTTGTTTTGTTTTTAAACACCTGCTTACCCCTTAAATGCAATTT 



ATTTACTTTTACCACTGTCACAGAAACATCCACAAGATACCAGCTAGGTCAGGGGGTGGGGAA 



AACACATACAAAAAGGCAAGCCCATGTCAGGGCGATCCTGGTTCAAATGTGCCATTTCCCGGG 
TTGATGCTGCCACACTTTGTAGAGAGTTTAGCAACACAGTGTGCTTAGTCAGCGTAGGAATCC 
T C ACT AAAGC AG AAG AAG TTC CATTC AAAG TGC C AATG AT AG AGTC AAC AG G AAGG TT AA TG T 



TGGAAACACAATCAGGTGTGGATTGGTGCTACTTTGAACAAAAGGTCCCCCTGTGGTCTTTTG 
TTCAACATTGTACAATGTAGAACTCTGTCCAACACTAATTTATTTTGTCTTGAGTTTTACTAC 
AAG ATG AG ACT ATGG ATCCCGCATG C CTG AATTC ACT AAAG CCAAGGGTCTGT AAGCCACG CT 



GCTCTTCCGAGACTTCCATTCCTTTCTGATTGGCACACGTGCAGCTCATGACAATCTGTAGGA 
TAACAATCAGTGTGGATTTCCACTCTTTTCAGTCCTTCATGTTAAAGATTTAGACACCACATA 
CAACTGGTAAAGGACGTTTTCTTGAGAGTTTTAACTATATGTAAACATTGTATAATGATATGG 
AAT AAAATG C AC ATTGT AGG AC ATTTT CT AAAAAAAAAAAAAAAAA 
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ORF Start: ATG at 73 


1 .,. 




ORF Stop: TGA at 925 




SEQIDNO; 120 


284 aa 


MWat 32677. IkD 


NOV28e, 

CGI 14588-05 Protein 
Sequence 


MDAIKKKMQMLKLDKENALDRAEQAEADKKAAEDRSKQLEEDIAAKEKLLRVSEDERDRVLEE 
LHKAEDSLLAAEEAAAKAEADVASLNRRIQLVEEELDRAQERLATALQKLEEAEKAADESERG 
MKVIESRAQKDEEKMEIQEIQLKEAKHIAEDADRKYEEVARKLVIIESDLERAEERAELSEGK 
CAELEEELKTVTNNLKSLEAQAEKYSQKEDRYEEEIKVLSDKLKEAETRAEFAERSVTKLEKS 
IDDLEEKVAHAKEENLSMHQMLDQTLLELNNM 




SEQIDNO: 121 


jll99 bp 






NOV28f, 

CGI 14588-06 DNA 
Sequence 

1 


CCCGACCGCGCGCTCGCCCCGCCGCTCCTGCTGCAGCCCCAGGGCCCCTCGCCGCCGCCACCA 


TGGACGCC AT CAAGAAG AAG ATG C AG ATGCTG AAG CT CG ACAAGG AG AACGC CTTGG ATCG AG 
CTGAGCAGGCGGAGGCCGACAAGAAGGCGGCGGAAGACAGGAGCAAGCAGCTCGAGGAGGACA 
TCGCGGCCAAGGAGAAGTTGCTGCGGGTGTCGGAGGACGAGCGGGACCGGGTGCTGGAGGAGC 
TGCACAAGGCGGAGGACAGCCTCCTGGCCGCCGAAGAGGCCGCCGCCAAGGCTGAAGCCGACG 
TAGCTTCT CTGAAC AG ACGC ATC C AG C TGGTTG AGG AAG AGTTGG AT CGTGCC C AGG AGCG TC 
TGGCAACAGCTTTGCAGAAGCTGGAGGAAGCTGAGAAGGCAGCAGATGAGAGTGAGAGAGGCA 
TGAAAGTCATTGAGAGTCGAGCCCAAAAAGATGAAGAAAAAATGGAAATTCAGGAGATCCAAC 
TGAAAGAGGCAAAGCACATTGCTGAAGATGCCGACCGCAAATATGAAGAGGTGGCCCGTAAGC 
TGGTCATCATTGAGAGCGACCTGGAACGTGCAGAGGAGCGGGCTGAGCTCTCAGAAGGCCAAG 
TC CG ACAGCT GG AAG AAC AATT AAG AATAATGG AT CAG ACCT TG AAAGC ATT AATGG C TGC AG 
AGGATAAGTACTCGCAGAAGGAAGACAGATATGAGGAAGAGATCAAGGTCCTTTCCGACAAGC 
TGAAGGAGGCTGAGACTCGGGCTGAGTTTGCGGAGAGGTCAGTAACTAAATTGGAGAAAAGCA 
TTGATGACTTAGAAGAGAAAGTGGCTCATGCCAAAGAAGAAAACCTTAGTATGCATCAGATGC 
TGGATCAGACTTTACTGGAGTTAAACAACATGTGAAAACCTCCTTAGCTGCGACCACATTCTT 




TCGTTTTGTTTTGTTTTGTTTTTAAACACCTGCTTACCCCTTAAATGCAATTTATTTACTTTT 




ACCACTGTCACAGAAACATCCACAAGATACCAGCTAGGTCAGGGGGTGGGGAAAACAG \TACA 




AAAAGGCAAGCCCATGTCAGGGCGATCCTGGTTCAAATGTGCCATTTCCCGGGTTGATGCTGC 




CACACTTTGTAGAGAGTTTAGCAACACAGTGTGCTTAGTCAGCGTAGGAATCCTCACT AAAGC 




AG 




ORF Start: ATG at 63 j j 


ORF Stop: TGA at 91 5 




SEQIDNO: 122 


284 aa 


MWal 32816.4kD 


NOV28f, 

CGI 14588-06 Protein 
Sequence 


MDAIKKKMQMLKLDKENALDRAEQAEADKKAAEDRSKQLEEDIAAKEKLLRVSEDERDRVLEE 
LHKAEDSLLAAEEAAAKAEADVASLNRRIQLVEEELDRAQERLATALQKLEEAEKAADESERG 
MKVIESRAQKDEEKMEIQEIQLKEAKHIAEDADRKYEEVARKLVI IESDLERAEERAELSEGQ 
VRQLEEQLRIMDQTLKALMAAEDKYSQKEDRYEEEIKVLSDKLKEAETRAEFAERSVTKLEKS 
IDDLEEKVAHAKEENLSMHQMLDQTLLELNNM 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 28B. 



Table 28B. Comparison of NOV28a against NOV28b through NOV28f. 


Protein Sequence 


NOV28a Residues/ 
Match Residues 


i 

Identities/ 

Similarities for the Matched Region 


NOV28b 


1..133 . 
91. .223 


101/133 (75%) 
109/133 (81%) 


;NOV28c 
i 


I..133 
127..259 


117/133 (87%) 
118/133 (87%) 


NOV28d 


66.. 133 
38..98 


23/68 (33%) 
37/68 (53%) 


NOV28e 


1..133 
127..259 


117/133 (87%) 
118/133 (87%) 



192 

i 



WO 03/023002 



PCT/US02/28539 



NOV28f 


j I..I33 


101/133 (75%) 




i I27..259 

i 


109/133 (81%) 



Further analysis of the NOV28a protein yielded the following properties shown in 
Table 28C. 



Table 28C Protein Sequence Properties NOV28a 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



5 



A search of the NOV28a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 28D. 



Table 28D. Geneseq Results for NOV28a 


Geneseq 
Identifier 


Protein/Organism/Length (Patent #, 
Date] 


NOV28a 
Residues/ 
Match 
Residues 


Identities/ 
i Similarities for 
the Matched 
Region 


Expect 
Value 


ABB90770 


Human Tumour Endothelial Marker 
polypeptide SEQ ID NO 273 - Homo 
sapiens, 284 aa. [WO2002102I7-A2, 
07-FEB-2002] 


1..133 
I27..259 


132/133 (99%) 
133/133 (99%) 


3e-67 

i 


AAG66545 


Human interferon-alpha induced 
polypeptide, TPM1 - Homo sapiens, 
284 aa. [ WO200 1 59 1 55-A2, 1 6-AUG- ■ 
2001] 


I..133 
127..259 


132/133 (99%) 
133/133 (99%) 


3e-67 


AAM78512 


Human protein SEQ ID NO 1 174 - 
Homo sapiens, 284 aa. [WO200157190- ! 
A2,09-AUG-200I] 


1 .133 
127..259 


132/133 (99%) 
133/133(99%) 


3e-67 


AAY92334 


Human alpha-lropomyosin - Homo 
sapiens, 284 aa. [WO200020448-A2, j 
13-APR-2000] 


I..I33 
127..259 


132/133 (99%) 
133/133 (99%) 


3e-67 


ABB57037 


Mouse ischaemic condition related 
protein sequence SEQ ID NO:47 - Mus 
musculus, 284 aa. [WO200188188-A2, 
22-NOV-200I] 


1..I33 
127..259 


131/133 (98%) 
133/133 (99%) 


7e-67 
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193 



WO 03/023002 W> PCT/US02/28539 



In a BLAST search of public sequence datbases, the NOV28a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 28E. 



j Table 28E. Public BLASTP Results for NOV28a 


i 

1 i^roiein 
: Accession 
i Number 

i 


Protein/Organism/Length 


NOV28a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


■Q91XN6 

j 

1 


Tropomyosin alpha isoform - Rattus 
norvegicus (Rat), 287 aa. 


1..16I 
127..287 


156/161 (96%) 
158/161 (97%) 


le-81 


! PI 8343 

i 

i 

! 
1 


Tropomyosin alpha chain, brain-2 
(TMBR-2) - Rattus norvegicus (Rat), 
251 aa. 


I..16I 
91. .251 


156/161 (96%) 
158/161 (97%) 


le-81 


: Q9Y427 

\ 
i 

s 


Hypothetical 34.9 kDa protein - 
Homo sapiens (Human), 308 aa 
(fragment). 


1..133 
151. .283 


132/133 (99%) 
133/133 (99%) 


9e-67 


P09493 


Tropomyosin 1 alpha chain (Alpha- 
tropomyosin) - Homo sapiens 
(Human), 284 aa. 


1..I33 
127..259 


132/133 (99%) 
133/133 (99%) 


9e-67 


C398I6 

! 


tropomyosin 5a, fibroblast - rat, 248 
aa. 


I..133 
91. .223 


131/133 (98%) 
133/133 (99%) 


2e-66 



5 PFam analysis predicts that the NOV28a protein contains the domains shown in the 

Table 28F. 



j Tabic 28F. Domain Analysis of NOV28a 



f 




Identities/ 




Pfam Domain 


NOV28a Match Region 


Similarities 


Expect Value 


! 




for the Matched Region 




1 

Tropomyosin 


I..I37 


115/137 (84%) 


2.5e-l09 






137/137(100%) 





10 Example 29. 

The NOV29 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 29A. 



Tabic 29 A. NOV29 Seq uence Analysis 
"ISEQIDNO: 123 



3344 bp 



194 



WO 03/023002 




'CT/LS02/28539 



|NOV29a, 

:CG 114621-01 DNA 
'Sequence 



TTGACTGTATCGCCGGAATTC ATGAAGTGCGTCTTGGTGGCCACTGAGGGCGCACiAnnTrrTr' 

TTCT ACTGG ACAG ATC AGG AGTTTG AAG AG AG TC T CCGG CTG AAGT TCGGG C AG T C AG AG AAT 

GAGGAAGAAGAGCTCCCTGCCCTGGAGGACCAGCTCAGCACCCTCCTAGCCCCGGTCATCATC 

TCCTCCATGACGATGCTGGAGAAGCTCTCGGACACCTACACCTGCTTCTCCACGGAAAATGGC 

AACTTCCTGTATGTCCTTCACCTGGTGCGGCCCCCAGACCTGGCGCAGCGTGTCCAGCTGTGG 

GAGCACTTCCAGAGCCTGCTGTGGACCTACAGCCGCCTGCGGGAGCAGGAGCAGTGCTTCGCC 

GTGGAGGCCCTGGAGCGACTGATTCACCCCCAGCTCTGTGAGCTGTGCATAGAGGCGCTGGAG 

CGGCACGTCATCCAGGCTGTCAACACCAGCCCCGAGCGGGGAGGCGAGGAGGCCCTGCATGCC 

TTCCTGCTCGTGCACTCC AAG CTGC TGG C ATTCT ACT CT AG CC AC AGTGC C AG CT C C CTG CG C 

CCGGCCGACCTGCTTGCCCTCATCCTCCTGGTTCAGGACCTCTACCCCAGCGAGAGCACAGCA 

GAGGACGACATTCAGCCTTCCCCGCGGAGGGCCCGGAGCAGCCAGAACATCCCCGTGCAGCAG 

GCCTGGAGCCCTCACTCCACGGGCCCAACTGGGGGGAGCTCTGCAGAGACGGAGACAGACAGC 

TTCTCCCTCCCTGAGGAGTACTTCACACCAGCTCCTTCCCCTGGCGATCAGAGCTCAGGTAGC 

ACCATCTGGCTGGAGGGGGGCACCCCCCCCATGGATGCCCTTCAGATAGCAGAGGACACCCTC 

CAAACACTGGTTCCCCACTGCCCTGTGCCTTCCGGCCCCAGAAGGATCTTCCTGGATGCCAAC 

GTGAAGGAAAGCTACTGCCCCCTAGTGCCCCACACCATGTACTGCCTGCCCCTGTGGCAGGGC 

ATCAACCTGGTGCTCCTGACCAGGAGCCCCAGCGCGCCCCTGGCCCTGGTTCTGTCCCAGCTG 

ATGGATGGCTTCTCCATGCTGGAGAAGAAGCTGAAGGAAGGGCCGGAGCCCGGGGCCTCCCTG 

CGCTCCCAGCCCCTCGTGGGAGACCTGCGCCAGAGGATGGACAAGTTTGTCAAGAATCGAGGG 

GC AC AGG AG ATTCAG AGC ACCTGGCTGG AGTTT AAGGCC AAGG CTTTCT C C AAAAGTG AG C CC 

GG ATC CTC CTGGG AGCTG CTC CAGG C ATG TGGG AAGCTG AAGCGG CAG C T CTG CG CC MTCT AC 

CGGCTGAACTTTCTGACCACAGCCCCCAGCAGGGGAGGCCCACACCTGCCCCAGCACCTGCAG 

GACCAAGTGCAGAGGCTCATGCGGGAGAAGCTGACGGACTGGAAGGACTTCTTGCTGGTGAAG 

AGCAGGAGGAACATCACCATGGTGTCCTACCTAGAAGACTTCCCAGGCTTGGTGCACTTCATC 

TATGTGGACCGCACCACTGGGCAGATGGTGGCGCCTTCCCTCAACTGCAGTCAAAAGACCTCG 

TCGGAGTTGGGCAAGGGGCCGCTGGCTGCCTTTGTCAAAACTAAGGTCTGGTCTCTGATCCAG 

CTGGCGCGC AG AT ACCTGCAG AAGGGCT AC ACC ACGCTG CTGTTC CGGG AGGGGG ATT T CT AC 

TGCTCCTACTTCCTGTGGTTCGAGAATGACATGGGGTACAAACTCCAGATGATCGAGGTGCCC 

GTCCTCTCCGACGACTCAGTGCCTATCGGCATGCTGGGAGGAGACTACTACAGGAAGCTCCTG 

CGCTACTACAGCAAGAACCGCCCAACCGAGGCTGTCAGGTGCTACGAGCTGCTGGCCCTGCAC 

CTGTCTGTCATCCCCACTGACCTGCTGGTGCAGCAGGCCGGCCAGCTGGCCCGGCGCCTCTGG 

GAGGCCTCCCGTATCCCCCTGCTCTAGGCCAAGGTGGCCGCAGTCTGCCTTTGCATCCTGTCC 



TCCAGCCACCCTTGCTTGCCACTGTTCCCCATGACGAGAGCCTCCTGTCTGCAGTGGCCATCC 



TGAGGATAGGGCAGAGTGCCCAGGGTGGCCCCAGGGCTTCTAAAACCCCACCTAGACCACCCT 



CC ATGTC AGGT AC TG AGC AAG GC C C CAG AT C CTTC T CTC TGG AGG AAG AGGG AAG C C C AGGGG 



TCCTGTTTGTAAAACAACGGTGGCAACAGCTCCTCTTCCAGAGCTGCCTCTGCCTTTATCCTG 



GGAGATGGGGAGGAAGCCCCATCTCTGCTGTTCCCTGCGTGGAGGAAGCCCACCCAGCAAGCT 



CTCTCCTACCCCAGGTAAAAGGTGCTCCTTTGCCTGGGTTTGAATTCCAGCGCTGCCACTTCC 



TCTCTGCACCTCCTGGCAAGTTTCTTCTATTCCCCACGTTTAAAGCGATGGCACCTCCGTCCC 



AGGGTGGTGTGAGGATTACCCAGTGTGGTAGGTGCTCAATAAATGTTGGTCATTGTTATCACT 



GAAGCCCAACATGCTAGTGCTTCTAGACCCTTCTGTCAGTGCTGATAAGCCCTTGCTAAGTCC 



CAGCCCCTTCATGCTTGGCTGGCGTCTGCCCTAGGGCTGGGGTTCTCAAGCCCCTGGCCCTGG 



CCCAGAGATTTGGATTCCCTTGGCGGCCGTGGAGCCCAGGCTTTGATGTCTTTCAAAGCTTCT 



GTGGTGCGCCCTGGATTGAGAACCACCACCCGAGGGGTACAGCCCCTCTCTTCCAACCGAGAA 



GTTCCTGTCCAGAATGGACCCAGGGACAAGAGACCCTGAGAGCCCTGGGACTGGGAGTGTCTG 



CTCCTCTGAGCCAGGAGGCCGGTGCTGGGCCAGAGAGGACGGCGTGGCGAAAGTCAGCGTCCA 



CTGCAGCACAGGATCAGATGGCCGTGTGCTGTGCATGCAGGAGCCTCGCCTTCTGTGTCTTTA 



GTCTTG AG CC AAAATTTG CT C AAAAG ACTG ATCT C T TC CTTG C AGGG AAC AG C T TTGGGGC TG 



GGGGAACTAGAACCCACATGTTGGTCTAAACCCTGAGAAGGTGGCAGTGAGGAAGTATCCCCT 



CAGGTGACTGGATCTGTGTTCCTCCTTAACATCATCTGATGGAATGGCAATGAAAAGCGTGGA 



TTGTGGAAAATACAGAAAAACATAAAGGAAAAAACTCCAATCCCCTGAGCCCACCACTGTTCA 



GGACCCCTGCTTTTGTCACCTACTATTTCCCTTTAGTTTTTAGCAGCGGCTGGATGTGATATG 



TCTAGTTTAACCAGTCCCCTTGATCTTTCTATATAATAAATAACACAGGAGTGAACATCCTGA 



ATCAG 



ORF Start: ATG at 22 



SEQ ID NO: 124 



ORF Stop: TAG at 1978 



652 aa 



|NOV29a, 

jCG 114621-01 Protein 
Sequence 



MWat 73743.7kD 



MKCVLVATEGAEVLFYWTDQEFEESLRLKFGQSENEEEELPALEDQLSTLLAPVIISSMTMLE 
KLSDTYTCFSTENGNFLYVLHLVRPPDLAQRVQLWEHFQSLLWTYSRLREQEQCFAVEALERL 
IHPQLCELCIEALERHVIQAVNTSPERGGEEALHAFLLVHSKLLAFYSSHSASSLRPADLLAL 
ILLVQDLYPSESTAEDDIQPSPRRARSSQNIPVQQAWSPHSTGPTGGSSAETETDSFSLPEEY 
FTPAPSPGDQSSGSTIWLEGGTPPMDALQIAEDTLQTLVPHCPVPSGPRRIFLDANVKESYCP 
LVPHTMYCLPLWQGINLVLLTRSPSAPLALVLSQLMDGFSMLEKKLKEGPEPGASLRSQPLVG 
DLRQRMDKFVKNRGAQEIQSTWLEFKAKAFSKSEPGSSWELLQACGKLKRQLCAI YRLNFLTT 
APSRGGPHLPQHLQDQVQRLMREKLTDWKDFLLVKSRRNITMVSYLEDFPGLVHFIYVDRTTG 
QMVAPSLNCSQKTSSELGKGPLAAFVKTKVWSLIQLARRYLQKGYTTLLFREGDFYCSYFLWF 



195 



# • 

WO 03/023002 PCT/US02/28539 





ENDMGYKLQMIEVPVLSDDSVPIGMLGGDYYRKLLRYYSKNRPTEAVRCYELLALHLSVIPTD 
LLVQQAGQLARRLWEASRI PLL 




SEQ ID NO: 125 |2109 bp 




NOV29b, 

CGII4621-02DNA 
Sequence 


GCCAAGATGAAGTGCGTCTTGGTGGCCACTGAGGGCGCAGAGGTCCTCTTCTACTGGACAGAT 

CAGGAGTTTGAAGAGAGTCTCCGGCTGAAGTTCGGGCAGTCAGAGAATGAGGAAGAAGAGCTC 

CCTGCCCTGGAGGACCAGCTCAGCACCCTCCTAGCCCCGGTCATCATCTCCTCCATGACGATG 

CTGGAGAAGCTCTCGGACACCTACACCTGCTTCTCCACGGAAAATGGCAACTTCCTGTATGTC 

CTTCACCTGTTTGGAGAATGCCTGTTCATTGCCATCAATGGTGACCACACCGAGAGCGAGGGG 

GACCTGCGGCGGAAGCTGTATGTGCTCAAGTACCTGTTTGAAGTGCACTTTGGGCTGGTGACT 

GTGGACGGTCATCTTATCCGAAAGGAGCTGCGGCCCCCAGACCTGGCGCAGCGTGTCCAGCTG 

TGGGAGCACTTCCAGAGCCTGCTGTGGACCTACAGCCGCCTGCGGGAGCAGGAGCAGTGCTTC 

GC CGTGG AGGCCCTGG AG CG ACTGATTC ACC C CC AGC TCTGTG AGCTG TGC AT AG AGG CG C TG 

GAGCGGCACGTCATCCAGGCTGTCAACACCAGCCCCGAGCGGGGAGGCGAGGAGGCCCTGCAT 

GCCTTCCTGCTCGTGCACTCCAAGCTGCTGGCATTCTACTCTAGCCACAGTGCCAGCTCCCTG 

CGCCCGGCCGACCTGCTTGCCCTCATCCTCCTGGTTCAGGACCTCTACCCCAGCGAGAGCACA 

GCAGAGGACGACATTCAGCCTTCCCCGCGGAGGGCCCGGAGCAGCCAGAACATCCCCGTGCAG 

CAGGCCTGGAGCCCTCACTCCACGGGCCCAACTGGGGGGAGCTCTGCAGAGACGGAGACAGAC 

AGCTTCTCCCTCCCTGAGGAGTACTTCACACCAGCTCCTTCCCCTGGCGATCAGAGCTCAGGT 

AGCACCATCTGGCTGGAGGGGGGCACCCCCCCCATGGATGCCCTTCAGATAGCAGAGGACACC 

CTCCAAACACTGGTTCCCCACTGCCCTGTGCCTTCCGGCCCCAGAAGGATCTTCCTGGATGCC 

AACGTG AAGGAAAGCT ACTGCCC CCT AG TG C CCC AC ACC ATGTACTG CCTGCC C C TGTGG C AG 

GGCATCAACCTGGTGCTCCTGACCAGGAGCCCCAGCGCGCCCCTGGCCCTGGTTCTGTCCCAG 

CTGATGG ATGGCTTCTCC ATG CTGG AGAAG AAGCTG AAGG AAGGG CCGG AGCC CG G GG CC T C C 

CTGCGCTCCQAGCCCCTCGTGGGAGACCTGCGCCAGAGGATGGACAAGTTTGTCAAGAATCGA 

GGGGCACAGGAGATTCAGAGCACCTGGCTGGAGTTTAAGGCCAAGGCTTTCTCCAAAAGTGAG 

CCCGGATCCTCCTGGGAGCTGCTCCAGGCATGTGGGAAGCTGAAGCGGCAGCTCTGCGCCATC 

TACCGGCTGAACTTTCTGACCACAGCCCCCAGCAGGGGAGGCCCACACCTGCCCCAGCACCTG 

C AGG ACC AAGTGC AG AGG CTC ATG CGGG AG AAG C TG A CGG ACTGG AAGG A CTT CT TG CTGG TG 

AAG AGC AGG AGGAAC ATC ACC ATGGTG TC CTACC T AG AAG ACT TCCC AGG CTTGG TG C ACTT C 

AT CT ATG TGG ACCG C ACC ACTGGG C AG ATGGTGG CG C CTTC CCTC AACTG C AG TC AAAAG AC C 

TCGTCGG AGTTGGGC AAGGGG C CGCTGG C TG C CTTTG TC AAAACT AAGGT CTGGT CT C TG AT C 

C AGCTGG CG CGCAG AT AC CTG C AG AAGGG CTACACCACG CTGCTGTTCCG GG AGG G GG ATTT C 

T ACTGCT CCTACTTCCTG TGG TT CG AG AATG A CATGGGGT AC AAAC TCC AG ATG A T CG AGGTG 

CCCGTCCTCTCCGACGACTCAGTGCCTATCGGCATGCTGGGAGGAGACTACTACAGGAAGCTC 

CTGCGCTACTACAGCAAGAACCGCCCAACCGAGGCTGTCAGGTGCTACGAGCTGCTGGCCCTG 

CACCTGTCTGTCATCCCCACTGACCTGCTGGTGCAGCAGGCCGGCCAGCTGGCCCGGCGCCTC 

TGGGAGGCCTCCCGTATCCCCCTGCTCTAG 




ORF Start: ATG at 7 


ORF Stop: TAG at 2 107 




SEQ ID NO: 126 700 aa jMW at 793 19.0kD 


NOV29b, 

CGI 14621-02 Protein 
Sequence 


MKCVLVATEGAEVLFYWTDQEFEESLRLKFGQSENEEEELPALEDQLSTLLAPVI ISSMTMLE j 

KLSDTYTCFSTENGNFLYVLHLFGECLFIAINGDHTESEGDLRRKLYVLKYLFEVHFGLVTVD S 

GHLIRKELRPPDLAQRVQLWEHFQSLLWTYSRLREQEQCFAVEALERLIHPQLCELCIEALER 

HVIQAVNTSPERGGEEALHAFLLVHSKLLAFYSSHSASSLRPADLLALILLVQDLYPSESTAE 

DDIQPSPRRARSSQNIPVQQAWSPHSTGPTGGSSAETETDSFSLPEEYFTPAPSPGDQSSGST 

IWLEGGTPPMDALQIAEDTLQTLVPHCPVPSGPRRIFLDANVKESYCPLVPHTMYCLPLWQGI 

NLVLLTRSPSAPLALVLSQLMDGFSMLEKKLKEGPEPGASLRSQPLVGDLRQRMDKFVKNRGA 

QEIQSTWLEFKAKAFSKSEPGSSWELLQACGKLKRQLCAIYRLNFLTTAPSRGGPHLPQHLQD 

QVQRLMREKLTDWKDFLLVKSRRNITMVSYLEDFPGLVHFIYVDRTTGQMVAPSLNCSQKTSS 

ELGKGPLAAFVKTKVWSLIQLARRYLQKGYTTLLFREGDFYCSYFLWFENDMGYKLQMIEVPV 

LSDDSVPIGMLGGDYYRKLLRYYSKNRPTEAVRCYELLALHLSVIPTDLLVQQAGQLARRLWE 
AS R I PLL 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 29B. 



Table 29B. Comparison of NOV29a against NOV29b. 


Protein Sequence 

1 


NOV29a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 



196 



WO 03/023002 



PCT/US02/28539 



NOV29b 


j 1..652 


617/700 (88%) 




! I..700 

i 


618/700 (88%) 



Further analysis of the NOV29a protein yielded the following properties shown in 
Table 29C. 



Table 29C. Protein Sequence Properties NOV29a 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3921 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



5 



A search of the NOV29a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 29D. 



Table 29D. Geneseq Results for NOV29a 


Geneseq 
Identifier 


Protein/Organism/Length |Patent #, 
Date] 


NOV29a 
Residues/ 
Match 
Residues 


Identities/ ! 
Similarities for j Expect j 
the Matched j Value 
Region j 


ABB67493 


Drosophila melanogaster polypeptide 
SEQ ID NO 29271 - Drosophila 
melanogaster. 596 aa. [WO2001 71042- 
A2, 27-SEP-2001] 


465..639 
409..583 


71/187(37%) |9e-27 

104/187 (54%) ! 

j 
j 


AAB93149 


Human protein sequence SEQ ID 
NO: 12061 - Homo sapiens, 120 aa. 
[EP1 0746 1 7-A2, 07-FEB-2001 ] 


204..280 
16.. 92 


22/77 (28%) 
34/77 (43%) 


0.19 


AAM93779 


Human polypeptide, SEQ ID NO: 3792 
- Homo sapiens, 120 aa. (EP1 130094- 
A2, 05-SEP-200I] 


204..280 
I6..92 


22/77 (28%) 
34/77 (43%) 


0.19 

J 


AAU48536 


Propionibacterium acnes immunogenic 
protein #9432 - Propionibacterium 
acnes, 141 aa. [WO20018158I-A2, 01- 
NOV-2001] 


213..302 
34..I39 


37/109 (33%) 
41/109 (36%) 


0.55 


A A El 6280 


Human kinase PKIN-26 protein - Homo 
sapiens. 660 aa. [WO200I96547-A2, 
20-DEC-2001] 


2I2..279 
I0..77 


22/68 (32%) 
28/68 (40%) 


0.72 



10 

197 



WO 03/023002 



r CT/US02/28539 



In a BLAST search of public sequence datbases, the NOV29a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 29E. 



Table 29E. Public BLASTP Results for NOV29a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV29a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q92902 


Hermansky-Pudlak syndrome 1 protein - 
Homo sapiens (Human), 700 aa. 


1..652 
1 ..700 


651/700 (93%) 
652/700 (93%) 


0.0 


Q8WXE5 


Hermansky-Pudlak syndrome - Homo 
sapiens (Human), 700 aa. 


1..652 
1..700 


650/700 (92%) 
652/700 (92%) 


0.0 


Q99MK7 


Hermansky-Pudlak syndrome protein - 
Rattus norvegicus (Rat), 706 aa. 


1 .651 
1..705 


533/706 (75%) 
583/706(82%) ' 


0.0 


008983 


Hermansky-Pudlak syndrome 1 protein 
homolog - Mus musculus (Mouse), 704 
aa. 


1..65J 
1 ..703 


524/704 (74%) 
577/704 (81%) 


o.c> 


Q9UH26 


DJ 11 1 9A7.3 (Putative novel protein 
similar to HPS (Hermansky-Pudlak 
syndrome protein)) - Homo sapiens 
(Human), 1 49 aa (fragment). 


1.-121 
1..149 


107/149 (71%) 
113/149(75%) 


5e-53 



5 

PFam analysis predicts that the NOV29a protein contains the domains shown in the 
Table 29F. 



Table 29F. Domain Analysis of NOV29a 



Pfam Domain 


NOV29a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 





10 Example 30. 

The NOV30 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 30A. 



Table 30A. NOV30 Sequence Analysis 




SEQ ID NO: 127 2450 bp 


NOV30a, 


CGTCGTCATCTTGGATCCGCGGGACAAGAAAATTCATGCGAGGGAGACGTGGTGGGCGGTCCT 



198 



WO 03/023002 ^ PCT/US02/28539 



CGI 14649-01 DNA 
Sequence 


TCCTGTGACACGACCCTTGAGTGACAGTTCTATTTGATTGCCTCCGGTACTGTGAGGAAAGGA 


C ACG AC TCT ATGG TG AGG AC TG ATGG AC AT A C ATT AT CTG AG AAAAG AAACT AC C AGG TG A P h 
AACAGCATGTTTGGTGCTTCAAGAAAGAAGTTTGTAGAGGGGGTCGACAGTGACTACCATGAC 
GAAAACATGTACTACAGCCAGTCTTCTATGTTTCCACATCGGTCAGAAAAAGATATGCTGGCA 
TCACCATCTACATCAGGTCAGCTGTCTCAGTTTGGGGCAAGTTTATACGGGCAACAAAACGGA 
AGTGAAAATGTGACAGGATTGGACCTTTCAGATTTCCCAGCATTAGCAGACCGAAACAGGAGG 
GAAGGAAGTGGTAACCCAACTCCATTAATAAACCCCTTGGCTGGAAGAGCTCCTTATGTTGGA 
ATGGTAACAAAACCAGCAAATGAACAATCCCAGGACTTCTCAATACACAATGAAGATTTTCCA 
G C ATT AC C AGGCT C C AGCT AT AAAG AT CC AAC AT C AAGT AATG ATG AC AGT AAAT CT AAT T TG 
AAT AC AT CTGG CAAG ACAACT TC AAGT AC AG ATGG AC CCAAATTC C C TGG AG AT AAAAGT T C A 
ACAACACAAAATAATAACCAGCAGAAAAAAGGGATCCAGGTGTTACCTGATGGTCGGGTTACT 
AAC ATTC CTCAAG GGATGGTGACGGACCAATTTGG AATG ATTGG CCT GTT AAC AT TT A T C AG G 
GCAGCAGAGACAGACCCAGGAATGGTACATCTTGCATTAGGAAGTGACTTAACAACATTAGGC 
CT CAATCTG AACT CT CCTG AAAATC T CT ACCC C AAAT TTG CGTCAC CCTG GGC AT CTTCACCT 
TGTCGACCTCAAGACATAGACTTCCATGTTCCATCTGAGTACTTAACGAACATTCACATTAGG 
GATAAGCTGGCTGCAATAAAACTTGGCCGATATGGTGAAGACCTTCTCTTCTATCTCTATTAC 
ATGAATGGAGGAGACGTATTACAACTTTTAGCTGCAGTGGAGCTTTTTAACCGTGATTGGAGA 
TACCACAAAGAAGAACGAGTATGGATTACCAGGGCACCAGGCATGGAGCCAACAATGAAAACC 
AATACCTATGAGAGGGGAACATATTACTTCTTTGACTGTCTTAACTGGAGGAAAGTAGCTAAG 
GAGTTCCATCTGGAATATGACAAATTAGAAGAACGGCCTCACCTGCCATCCACCTTCAACTAC 
AACCCTGCTCAGCAAGCCTTCTAAAAAAAAAAAAAAAAAAAAAAAAAAAGACTTCCCTTTTCT 


TGGGGTATGGCTGTCTCAGCACAATACTCAACATAACTGCAGAACTGATGTGGCTCAGGCACC 


CTGGTTTT AATTC CTTG AGG ATC TG G C AATTGG C TT ACG CAAAAGGTC AC C AT TT G AGG TCCT 


GCCTTACTAATTATGTGCTGCCCAACAACTAAATTTGTAATTTGTTTTTCTCTAGTTTGAGCA 


GGGTCTGAATTTTTTCATTTATTTCCTTTTTTGCCAGCAGACAGACTTGAGTCTGTAAAGACA 


AGCAAATACACTGACAGAAGTTTACCATAGTTTCTAAAATGTAAAAAAGAAAACCCCCAAAAG 


ACTCAAGAAAATTAGACCACAAATTTTGCATTGTTCATTGTAGCACTATTGGTAATAAAATAA 


CAAATGT TTGTG C ATTTTT ATGTG AAG ATCC TTC TCG T ATTTC ATTTGG AAAG ATG AG CAAG A 


GGTCTGCTTCCTTCATTTTACTTCCCCTTCTGTTTTTGAAAGGCAGTTTCGCCAAGCTTAATG 


CAAGAATATCTGACTGTTTAGAAGAAAGATATTGCCACAATCTCTGGATGGTTTTCCAGGGTT 


GTGTT ATT ACTGAGCTTC ATCTT TCC AG AATG AG C AAAAC ACTGTC C AGT C TTTG TT ACG AT T 


TTGTAATAAATGTGTACATTTTTTTTAAATTTTTGGACATCACATGAATAAAGGTATGTATGT 


ACGAATGTGTATATATTATATATATGACATCTATTTTGGAAAATGTTTGCCCTGCTGTACCTC 


ATTTTTAGGAGGTGTGCATGGATGCAATATATGAAAATGGGACATTCTGGAACTGCTGGTCAG 


GGG ACTTTGTCGCCCTGTGC AC T AAAAGGG CC AG ATT TT CAG C AGC C AAGG AC ATC C AT AC CC 


AAGTGAATGTGATGGGACTTAAAAGAAGTGAACTGAGACAATTCACTCTGGCTGTTTGAACAG 


CAGCGTTTCATAGGAAGAGAAAAAAAGATCAATCTTGTATTTTCTGACCACATAAAGGCTTCT 


TC TCTTTGT AAT AAAGT AGAAAAG CT C TC CTC AAAAAAAAAAAAAAAAAACTCG AG 




ORF Start: ATG at 1 36 jORF Stop: TA A at i 345 




SEQIDNO: 128 403 aa jMW at 45257.9kD 


NOV30a, 

CGI 14649-01 Protein 
Sequence 


MVRTDGHTLSEKRNYQVTNSMFGASRKKFVEGVDSDYHDENMYYSQSSMFPHRSEKDMLASPS 
TSGQLSQFGASLYGQQNGSENVTGLDLSDFPALADRNRREGSGNPTPLINPLAGRAPYVGMVT 
KPANEQSQDFSIHNEDFPALPGSSYKDPTSSNDDSKSNLNTSGKTTSSTDGPKFPGDKSSTTQ 
rWNQQKKGIQVLPIXSRVTNIPQGMVTDQFGMIGLLTFIRAAETDPGMVHLALGSDLTTLGLNL 
NSPENLYPKFASPWASSPCRPQDIDFHVPSEYLTNIHIRDKLAAIKLGRYGEDLLFYLYYMNG 
GDVLQLLAAVELFNRDWRYHKEERVWITRAPGMEPTMKTNTYERGTYYFFDCLNWRKVAKEFH 
LEYDKLEERPHLPSTFNYNPAQQAF 



Further analysis of the NOV30a protein yielded the following properties shown in 
Table 30B. 



Table 30B. Protein Sequence Properties NOV30a 


PSort 
analysis: ■ 


0.7600 probability located in nucleus; 0.2124 probability located in microbody 
(peroxisome); 0.1589 probability located in lysosome (lumen); 0.1000 probability 
located in mitochondrial matrix space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 
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A search of the NOV30a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 30C. 
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Table 30C. Geneseq Results for NOV30a 



Geneseq 
Identifier 


Protein/Organism/Lcngth (Patent #, 
Date] 


NOV30a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


| Expect 
| Value 

I 


AAW67820 


Human secreted protein encoded by 
gene 14 clone HE2DE47 - Homo 
sapiens, 541 aa. [W09842738-A1, 01- 
OCT-1998] 


66..403 
203..540 


324/338 (95%) 
330/338 (96%) 


0.0 


AAM79843 


Human protein SEQ ID NO 3489 - 

Homo sapiens, 45 1 aa. 

[ WO200 1 57 1 90-A2, 09-AUG-200 1 ] 


66..403 
108..451 


323/344 (93%) 
329/344 (94%) 


0.0 

i 

1 


AAW54379 


Ceil division cycle protein HCDCB - 
Homo sapiens. 280 aa. [W0981 1220- 
A2, 19-MAR-1998] 


124..403 
1..280 


279/280 (99%) 
279/280 (99%) 


e-!67 


AAM78859 


Human protein SEQ ID NO 1521 - 

Homo sapiens, 439 aa. 

[ WO200 1 57 1 90-A2, 09-AUG-200 1 ] 


66..294 
203..43I 


215/229 (93%) 
221/229 (95%) 


e-125 


ABB58904 


Drosophila melanogaster polypeptide 
SEQIDNO3504-Drosophiia 
melanogaster, 579 aa. [WO200I71042- 
A2,27-SEP-2001] 


165..395 
348.575 


120/231 (51%) 
166/23! (70%) 


2e-66 


In a BLAST search of public sequence datbases, the NOV30a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 30D. 


Table 30D. Public BLASTP Results for NOV30a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV30a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9NZN8 


Not2p (CCR4-NOT transcription 
complex, subunit 2) (Similar to CCR4- 
NOT transcription complex, subunit 2) - 
Homo sapiens (Human), 540 aa. 


66..403 
203..540 


324/338 (95%) 
330/338 (96%) 


0.0 


Q9H3E0 


MSTP046 - Homo sapiens (Human), 365 
aa. 


66..403 
28..365 


324/338 (95%) 
330/338 (96%) 


0.0 
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jQ9D0PI 

1 


26000 l6MI2Rik protein - Mus musculus 
(Mouse), 455 aa. 


66..403 
118..455 


323/338 (95%) 
330/338 (97%) 


0.0 


Q9NWR6 

i 


CDNA FLJ20655 fis, clone KAT01590 - 
Homo sapiens (Human), 490 aa. 


66.. 403 
203..490 


274/338 (81%) 
280/338 (82%) 


e-157 


Q9P028 


HSPC 1 3 1 - Homo sapiens (Human), 488 
aa. 


66..337 
203..474 j 


256/272 (94%) 
263/272 (96%) 


e-150 



PFam analysis predicts that the NOV30a protein contains the domains shown in the 
Table 30E. 



j Table 30E. Domain Analysis of NOV30a 







Identities/ 


r ' , i 


Pfam Domain 


NOV30a Match Region 


Similarities 


Expect Vaiue 






for the Matched Region 



! 
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Example 31. 

The NOV3 1 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 31 A. 



Table 31A. NOV31 Sequence Analysis 

]SEQ ID NO: 1 29 ~]605 bp 


NOV3!a, 

CGI 16785-01 DNA 
Sequence 


CTG AC ACC AG C AC AGC AAAC CCG CCGGG ATC AAAGTG T AC C AGTCGG C AG C ATGG CT A C G A A A 
TGTGGGAATTGTGGACCCGGCTACTCCACCCCTCTGGAGGCCATGAAAGGACCCAGGGAAGAG 
ATCGTCTACCTGCCCTGCATTTACCGAAACACAGGCACTGAGGCCCCAGATTATCTGGCCACT 
GTGGATGTTGACCCCAAGTCTCCCCAGTATTGCCAGGTTAGGCGGGGCTTGGGCGCCAGCTAC 
TTTGAGACCATAGCTGCCCTCATCCCTGGCCCTGGGCCCCCCCTTCCCAGCTCCATCCTTCTT 
GGCCCTCCCTGGGGATGCTTGTGCACGCTCAACCTGGGACAAGGGGAGTGCTGAAATCCAGCC 


TGTG CCGTG CT TC C AAAC CAAAATG AGTC C ACAGGGG CG CCTCTTCC AAAAG T GG AC AG AGG C 


GTGGCCTGGGGGAGCACCACCTCTCCCCGCATCCTAGGTCATCCACCGGCTGCCCATGCCCAA 


CCTG AAGG ACG AG CTGC ATC ACT C AGG AT GG AACAC C TG C AG C AG CTGCTTCGGTGATAGCAC 


CAAGTCGCGCACCAGGCTGGTGCTGCCAGTCTCATCTC 




ORF Start: at 28 j jORF Stop: at 343 




SEQ ID NO: 130 105 aa MW at 1 !074,6kD 


NOV31a, 

CGI 16785-01 Protein 
Sequence 


DQSVPVGSMATKCGNCGPGYSTPLEAMKGPREEIVYLPCIYRNTGTEAPDYLATVDVDPKSPQ 
YCQVRRGLGASYFETIAALIPGPGPPLPSSILLGPPWGCLCT 
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Further analysis of the NOV3 la protein yielded the following properties shown in 
Table 3 IB. 
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Table 31 B. Protein Sequence Properties NOV3la 


jPSort 
analysis: 

1 

f 


0.6500 probability located in cytoplasm; 0.1873 probability located in lysosome 
(lumen); 0.1000 probability located in mitochondrial matrix space; 0.0000 
probability located in endoplasmic reticulum (membrane) 


! SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV3 la protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 3 IC. 



Table 31C. Geneseq Results for NOV31a 


j Geneseq 
Identifier 


Protein/Organism/Length (Patent #, 
Date] 


NOV31a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAE22276 


Human selenium- binding protein 
(HSEBP) #1 - Homo sapiens, 492 aa. 
[US2002042066-A1, ll-APR-2002] 


1..67 
13..79 


67/67(100%) 
67/67(100%) 


le-35 


AAB57139 


Human prostate cancer antigen protein 
sequence SEQ I D NO: 1 7 1 7 - Homo 
sapiens, 499 aa. [WO200055 1 74-A 1,21- 
SEP-2000] 


1..67 
20..86 


67/67(100%) 
67/67(100%) 


le-35 


AAB47946 


HSEBP - Homo sapiens, 472 aa. 
[US63 1 2895-B 1 , 06-NOV-200 1 ] 


9..67 
I..59 


59/59(100%) 
59/59(100%) 


5e-31 


!AAE22277 


Human selenium- binding protein 
(HSEBP) #2 - Homo sapiens. 472 aa. 
[US2002042066-A 1,11 -APR-2002] 


9..67 
I..59 


59/59(100%) 
59/59(100%) 


5e-3l 


AAY68328 


Amyotrophic lateral sclerosis related p53 
protein - Homo sapiens, 472 aa. 
[JP2000000095-A, 07-JAN-2000] 


9..67 
1..59 


59/59(100%) 
59/59(100%) 


5e-31 



In a BLAST search of public sequence datbases, the NOV3 la protein was found to 
have homology to the proteins shown in the BLASTP data in Table 3 1 D. 



Tabic 31D. Public BLASTP Results for NOV31a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV31a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 
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Q96GX7 


Similar to selenium binding protein 1 - 
Homo sapiens (Human), 472 aa. 


9..67 
I..59 


59/59(100%) 
59/59(100%) 


le-30 

• j 

1 


Q13228 


Selenium-binding protein 1 - Homo 
sapiens (Human), 472 aa. 


9..67 
1 ..59 


59/59(100%) 
59/59(100%) 


le-30 j 


Q91X87 


Unknown (protein for MGC: 185 19) - 
Mus musculus (Mouse), 472 aa. 


9..77 
1..69 


58/69 (84%) 
59/69 (85%) 


4e-28 


PI 7563 

i 


Selenium-binding protein 1 (56 kDa 
selenium-binding protein) (SP56) - Mus 
musculus (Mouse), 472 aa. 


9..77 

1..69 j 


58/69 (84%) 
59/69 (85%) 


4e-28 


Q8R1T6 


Selenium binding protein 2 - Mus 
musculus (Mouse), 472 aa. 


9..77 
1..69 


57/69 (82%) 
58/69 (83%) 


2e-27 



PFam analysis predicts that the NOV3 la protein contains the domains shown in the 
Table 3 IE. 



j ■ 

Table 31E. Domain Analysis of NOV31a 







Identities/ 




Pfam Domain 


NOV31a Match Region 


Similarities 


Expect Value 






for the Matched Region 
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Example 32. 

The NOV32 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 32A. 



jTable 32 A. NOV32 Sequence Analysis 



1 


SEQ ID NO: 131 2620 bp 


iNOV32a, 

'■CG 118927-01 DNA 
Sequence 


AATTCTTTCATTTCAGAGTTGAGCAAACTGTGTCTCAAGGAGGTGTGCGCTGCAGGTAGCTGA 


TGTGGCCAGGATTTGAACCCACATTTTATATTCTTTGCCATTAGATTCCATTTTTCTTATCTC 


TTATTTCTCAGCGCCAGGTCTCTTTTTGTTTCTGGGCCTCTTTCTTCCTTGGCCTCCTGCCTC 


CCTGTCTTCCCATGTTCTTGTATCTGCTGCTGCCTGTGTTTTTCTCATTGTGTCTCTGCCTAG 


CTCCCTCTGCCCAGGTTTCTATGTGTCTCTTTCTGTCAGAACTGTCTTCTATTTCTTCTCAGT 


CCCCTTCTCTGCATTTGTTACATCATCTCTTCATCTTAATGTATTAGGCTGGGGCTGTTTTGG 


GACTCCAGAGAGTTACTTTTTAACTAGCTACATCTGTCTTCAGGTGCCTGGCTTTCTGGCCTG 


> 


AGGTGGAAGCAGAGGCCAGCCTCTCTCCGTCATCCCTATCTCCCCTGGGCTTACATCATGGCG 




GGGCTTTG AAAAT CTCCC TCTGTGG C TTCTTGG AGC CGC AT AG ATGGGTG ATC AG AGC TGG CT 


CTGGAACCAGGCTGCTCCAGGAGTAAGGAGCCCAGTCTTTGCATGCAGTGTTGAAAAAGGTAA 


TGTCCCTCTTGTGCTGAGCGAGCACCTGGCACACAGCAGGGACCCAGGCAGTGGGGCTGTTAG 


GTTCCTTATCTCTCCTGAGCCTTGGGCTTCCGCTATCCTTGGGACGTCTGGTCTTCTGGCTTC 


CCCGGTTCTTCCTGCCGCCCTGGATGCGGTGACCTGCCAGCACCTGCCGCAGCCTTCGTCCGG 


GAGTCGCCCCATCTCTCCACGCAATCGGCCCTGTGCCCCTTGCTGCTGCAGCCGGGCACCATG 


TCGACCTCGTCCTTGAGGCGCCAGATGAAGAACATCGTCCACAACTACTCAGAGGCGGAGATC 
AAGGTTCGAGAGGCCACGAGCAATGACCCCTGGGGCCCATCCAGCTCCCTCATGTCAGAGATT 
GCCGACCTCACCTACAACGTTGTCGCCTTCTCGGAGATCATGAGCATGATCTGGAAGCGGCTC 
AATGACCATGGCAAGAACTGGCGTCACGTTTACAAGGCCATGACGCTGATGGAGTACCTCATC 
AAGACCGGCTCGGAGCGCGTGTCGCAGCAGTGCAAGGAGAACATGTACGCCGTGCAGACGCTG 
AAGG A CT T C C AGT ACGTG G ACCG CG ACGG C AAGG ACC AGGGCGTG AACGTG CG TG AG AAAG CT 
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i 

i 

i 

i 

i 

i 
i 

i 
« 

i 
; 
! 
1 

| 


AAGCAGCTGGTGGCCCTGCTGCGCGACGAGGACCGGCTGCGGGAAGAGCGGGCGCACGCGCTC 
AAGACCAAGGAAAAGCTGGCACAGACCGCCACGGCCTCATCAGCAGCTGTGGGCTCAGGCCCC 
CCTCCCGAGGCGGAGCAGGCGTGGCCGCAGAGCAGCGGGGAGGAGGAGCTGCAGCTCCAGCTG 
GCCCTGGCCATGAGCAAGGAGGAGGCCGACCAGGAGGAGCGGATCCGTCGCGGGGATGACCTG 
CGGCTGCAGATGGCAATCGAGGAGAGCAAGAGGGAGACTGGGGGCAAGGAGGAGTCGTCCCTC 
ATGGACCTTGCTGACGTCTTCACGGCCCCAGCTCCTGCCCCGACCACAGACCCCTGGGGGGGC 
CCAGCACCCATGGCTGCTGCCGTCCCCACGGCTGCCCCCACCTCGGACCCCTGGGGCGGCCCC 
CCTGTCCCTCCAGCTGCTGATCCCTGGGGAGGTCCAGCCCCCACGCCGGCCTCTGGGGACCCC 
TGGAGG C C TGCTGC CCCTGCAGG AC CCTC AGTTG AC C CT TGGGGTGGG ACCC C AG CCC CTG C A 
GCTGGGGAGGGGCCCACGCCTGATCCATGGGGAAGTTCCGATGGTGGGGTCCCGGTCAGTGGG 
CCCTCAGCCTCCGATCCCTGGACACCGGCCCCGGCCTTCTCAGATCCCTGGGGAGGGTCACCT 
GCCAAGCCCAGCACCAATGGCACAACAGCAGCCGGGGGATTCGACACGGAGCCCGACGAGTTC 
TCTGACTTTGACCGACTCCGCACGGCACTGCCGACCTCCGGGAGCAGCGCAGGAGAGCTGGAG 
CTGCTGGCAGGAGAGGTGCCGGCCCGAAGCCCTGGGGCGTTTGACATGAGTGGGGTCAGGGGA 
TCTCTGGCTGAGGCTGTGGGCAGCCCCCCACCTGCAGCCACACCAACTCCCACGCCCCCCACC 
CGGAAGACGCCGGAGTCATTCCTGGGGCCCAATGCAGCCCTCGTCGACCTGGACTCGCTGGTG 
AGCCGGCCGGGCCCCACGCCGCCTGGAGCCAAGGCCTCCAACCCCTTCCTGCCAGGCGGAGGC 
CCAGCCACTGGCCCTTCCGTCACCAACCCCTTCCAGCCCGCGCCTCCCGCGACGCTCACCCTG 
AACCAGCTCCGTCTCAGTCCTGTGCCTCCCGTCCCTGGAGCGCCACCCACGTACATCTCTCCC 
CTTGGCGGGGGCCCTGGCCTGCCCCCCATGATGCCCCCGGGCCCCCCGGCCCCCAACACTAAT 
CCCTTCCTCCTATAATCCAGGGCGGAAGGGGGCCTGGCTCCATCCGGCTGCCCCATTCCGGCT 




CCCTGGGAGATCAGTGTTGTGAGTGCATGTGAAATGG 


j 


ORF Start: ATG at 880 


jORF Stop: TAA at 2533 


I 


SEQIDNO: 132 (551 aa JMW at 57574.6kD 


NOV32a, 

CG II 8927-0 1 Protein 
Sequence 

I 
i 


MSTSSLRRQMKNIVHNYSEAEIKVREATSNDPWGPSSSLMSEIADLTYNWAFSEIMSMIWKR 
LNDHGKNWRHVYKAMTLMEYLIKTGS 

AKQLVALLRDEDRLREERAHALKTKEKLAQTATASSAAVGSGPPPEAEQAWPQSSGEEELQLQ 
LALAMSKEEADQEERIRRGDDLRLQMAIEESKRETGGKEESSLMDLADVFTAPAPAPTTDPWG 
GPAPMAAAVPTAAPTSDPWGGPPVPPAADPWGGPAPTPASGDPWRPAAPAGPSVDPWGGTPAP 
AAGEGPTPDPWGSSDGGVPVSGPSASDPWTPAPAFSDPWGGSPAKPSTNGTTAAGGFDTEPDE 
FSDFDRLRTALPTSGSSAGELELLAGEVPARSPGAFDMSGVRGSLAEAVGSPPPAATPTPTPP 
TRKTPESFLGPNAALVDLDSLVSRPGPTPPGAKASNPFLPGGGPATGPSVTNPFQPAPPATLT 
LNQLRLSPVPPVPGAPPTYISPLGGGPGLPPMMPPGPPAPNTNPFLL 


i 


SEQIDNO: 133 


2449 bp | 


|NOV32b ? 

jCG 11 8927-02 DNA 
Sequence 

i 

i 

i 
j 

j 

j 

i 

! 


AATTCTTTCATTTCAGAGTTGAGCAAACTGTGTCTCAAGGAGGTGTGCGCTGCAGGTAGCTGA 


TGTGGCCAGGATTTGAACCCACATTTTATATTCTTTGCCATTAGATTCCATTTTTCTTATCTC 


TTATTTCTCAGCGCCAGGTCTCTTTTTGTTTCTGGGCCTCTTTCTTCCTTGGCCTCCTGCCTC 


CCTGTCTTCCCATGTTCTTGTATCTGCTGCTGCCTGTGTTTTTCTCATTGTGTCTCTGCCTAG 


CTCCCTCTGCCCAGGTTTCTATGTGTCTCTTTCTGTCAGAACTGTCTTCTATTTCTTCTCAGT 


CCCCTTCTCTGCATTTGTTACATCATCTCTTCATCTTAATGTATTAGGCTGGGGCTGTTTTGG 


GACTCCAGAGAGTTACTTTTTAACTAGCTACATCTGTCTTCAGGTGCCTGGCTTTCTGGCCTG 


AGGTGGAAGCAGAGGCCAGCCTCTCTCCGTCATCCCTATCTCCCCTGGGCTTACATCATGGCG 


GGGCTTTGAAAATCTCCCTCTGTGGCTTCTTGGAGCCGCATAGATGGGTGATCAGAGCTGGCT 


CTGGAACCAGGCTGCTCCAGGAGTAAGGAGCCCAGTCTTTGCATGCAGTGTTGAAAAAGGTAA 


TGTCCCTCTTGTGCTGAGCGAGCACCTGGCACACAGCAGGGACCCAGGCAGTGGGGCTGTTAG 


GTTCCTTATCTCTCCTGAGCCTTGGGCTTCCGCTATCCTTGGGACGTCTGGTCTTCTGGCTTC 


CCCGGTTCTTCCTGCCGCCCTGGATGCGGTGACCTGCCAGCACCTGCCGCAGCCTTCGTCCGG 


GAGTCGCCCCATCTCTCCACGCAATCGGCCCTGTGCCCCTTGCTGCTGCAGCCGGGCACCATG 


I 
« 

I 

I 


TCGACCTCGTCCTTGAGGCGCCAGATGAAGAACATCGTCCACAACTACTCAGAGGCGGAGATC 
AAGGTTCGAGAGGCCACGAGCAATGACCCCTGGGGCCCATCCAGCTCCCTCATGTCAGAGATT 
GCCGACCTCACCTACAACGTTGTCGCCTTCTCGGAGATCATGAGCATGATCTGGAAGCGGCTC 
AATG ACC ATGGCAAG AACTGG CGTC ACGTTT AC AAGG C C ATG ACG CTG ATG G AG T ACC T C A T C 
AAGACCGGCTCGGAGCGCGTGTCGCAGCAGTGCAAGGAGAACATGTACGCCGTGCAGACGCTG 
AAGGACTTCCAGTACGTGGACCGCGACGGCAAGGACCAGGGCGTGAACGTGCGTGAGAAAGCT 
AAG C AGCTGGTGG CCCTGCTG CTGG C CATG AGC AAGG AGG AGGCCG AC C AGG AGG AG CGG A T C 
CGTCGCGGGGATGACCTGCGGCTGCAGATGGCAATCGAGGAGAGCAAGAGGGAGACTGGGGGC 
AAGGAGGAGTCGTCCCTCATGGACCTTGCTGACGTCTTCACGGCCCCAGCTCCTGCCCCGACC 
ACAGACCCCTGGGGGGGCCCAGCACCCATGGCTGCTGCCGTCCCCACGGCTGCCCCCACCTCG 
GACCCCTGGGGCGGCCCCCCTGTCCCTCCAGCTGCTGATCCCTGGGGAGGTCCAGCCCCCACG 
CCGGCCTCTGGGGACCCCTGGAGGCCTGCTGCCCCTGCAGGACCCTCAGTTGACCCTTGGGGT 
GGGACCCCAGCCCCTGCAGCTGGGGAGGGGCCCACGCCTGATCCATGGGGAAGTTCCGATGGT 
GGGGTCCCGGTCAGTGGGCCCTCAGCCTCCGATCCCTGGACACCGGCCCCGGCCTTCTCAGAT 
CCCTGGGGAGGGTCACCTGCCAAGCCCAGCACCAATGGCACAACAGCAGCCGGGGGATTCGAC 
ACGGAGCCCGACGAGTTCTCTGACTTTGACCGACTCCGCACGGCACTGCCGACCTCCGGGAGC 
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1 


AGCGCAGGAGAGCTGGAGCTGCTGGCAGGAGAGGTGCCGGCCCGAAGCCCTGGGGCGTTTGAC 
ATGAGTGGGGTCAGGGGATCTCTGGCTGAGGCTGTGGGCAGCCCCCCACCTGCAGCCACACCA 
ACTCCCACGCCCCCCACCCGGAAGACGCCGGAGTCATTCCTGGGGCCCAATGCAGCCCTCGTC 
GACCTGGACTCGCTGGTGAGCCGGCCGGGCCCCACGCCGCCTGGAGCCAAGGCCTCCAACCCC 
TTCCTGCCAGGCGGAGGCCCAGCCACTGGCCCTTCCGTCACCAACCCCTTCCAGCCCGCGCCT 
CCCGCGACGCTCACCCTGAACCAGCTCCGTCTCAGTCCTGTGCCTCCCGTCCCTGGAGCGCCA 
CCCACGTACATCTCTCCCCTTGGCGGGGGCCCTGGCCTGCCCCCCATGATGCCCCCGGGCCCC 
CCGGCCCCCAACACTAATCCCTTCCTCCTATAATCCAGGGCGGAAGGGGGCCTGGCTCCATrr 


GGCTGCCCCATTCCGGCTCCCTGGGAGATCAGTGTTGTGAGTGCATGTGAAATGG 




ORF Start: ATG at 880 j ORF Stop: TAA at 2362 




SEQIDNO:134 494 aa |m W at 51422.0kD 


NOV32b, 

CGI 18927-02 Protein 
Sequence 


MSTSSLRRQMKNIVHNYSEAEIKVREATSNDPWGPSSSLMSEIAI)LTYWVAFSEIMSMIWKR 
LNDHGKNWRHVYKAMTLME YL I KTGS ERVSQQC KENM Y AVQTLKD FQY VDR DG KDQG VNVR E K 
AKQLVALLLAMSKEEADQEERIRRGDDLRLQMAIEESKRETGGKEESSLMDLADVFTAPAPAP 
TTDPWGGPAPMAAAVPTAAPTSDPWGGPPVPPAADPWGGPAPTPASGDPWRPAAPAGPSVDPW 
GGT P AP AAG EG PT PD P WG S SDGG VP V SGPSASDPWTPAPAFSD P WGG S P AK P S TNGTT AAGG F 
DTEPDEFSDFDRLRTALPTSGSSAGELELLAGEVPARSPGAFDMSGVRGSLAEAVGSPPPAAT 
PTPTPPTRKTPESFLGPNAALVDLDSLVSRPGPTPPGAKASNPFLPGGGPATGPSVTNPFQPA 
PPATLTLNQLRLSPVPPVPGAPPTYISPLGGGPGLPPMMPPGPPAPNTNPFLL 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 32B. 



Table 32B. Comparison of NOV32a against NOV32b. 


Protein Sequence 


NOV32a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV32b 


I..55I 
I..494 


360/551 (65%) 
362/551 (65%) 



5 

Further analysis of the NOV32a protein yielded the following properties shown in 
Table 32C. 



Table 32C. Protein Sequence Properties NOV32a 


PSort 

analysis: 


0.4600 probability located in mitochondrial matrix space; 0.4500 probability located 
in cytoplasm; 0.1903 probability located in lysosome (lumen); 0.1562 probability 
located in mitochondrial inner membrane 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



10 A search of the NOV32a protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 32D. 
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Table 32D. Geneseq Results for NOV32a 


j Geneseq 
j Identifier 


Protcin/Organism/Length [Patent #, 
Date) 


WAV/11- 

Residues/ 

Match 

Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB42049 


Human ORFX ORFI813 polypeptide 
sequence SEQ ID NO:3626 - Homo 
sapiens, 551 aa. [WO200058473-A2, 
05-OCT-2000] 


I..551 
1.551 


550/551 (99%) 
550/551 (99%) 


0.0 


AAB95100 


Human protein sequence SEQ ID 
NO: 17064 - Homo sapiens, 576 aa. 
[EP 1 0746 1 7- A2, 07-FEB-200 1 ] 


I..55I 
1..576 


551/576(95%) 
551/576(95%) 


0.0 


AAB24234 


Human vesicle associated protein 13 
SEQ ID NO: 13 - Homo sapiens, 576 aa. 
[WO200060082-A2, I2-OCT-2000) 


1..551 
1..576 


551/576(95%) 
551/576(95%) 


0.0 


AAB93525 


Human protein sequence SEQ ID 
NO: 12872 - Homo sapiens, 584 aa. 
[EP1074617-A2, 07-FEB-200I] 


1..551 
1..584 


305/636 (47%) 
367/636 (56%) 


e-139 


ABG12620 


Novel human diagnostic protein #1261 1 
- Homo sapiens, 4 1 7 aa. 
[WO200175067-A2, ll-OCT-2001] 


2.348 
4.345 


245/355 (69%) 
253/355 (71%) 


e-121 



In a BLAST search of public sequence datbases, the NOV32a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 32E. 



Table 32E. Public BLASTP Results for NOV32a 


Protein 

Accession 

Number 

! 


Protein/Organism/Length 


NOV32a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


IQ9Y6I3 

i 
i 

i 


EH domain-binding mitotic 
phosphoprotein - Homo sapiens 
(Human), 551 aa. 


1..551 
1..55I 


551/551 (100%) 
551/551 (100%) 


0.0 


:Q9HA18 

1 

! 


CDNA FLJ12392 fis, clone 
MAMMA 1002699, highly similar to 
Rattus norvegicus EH domain binding 
protein Epsin mRNA - Homo sapiens 
(Human), 576 aa. 


1..551 
I..576 


551/576 (95%) 
551/576(95%) 


0.0 


088339 


EH domain binding protein Epsin - 
Rattus norvegicus (Rat), 575 aa. 


1..551 
I..575 


521/576 (90%) 
527/576 (91%) 


0.0 


01 3027 


Mitotic phosphoprotein 90 - Xenopus 
laevis (African clawed frog), 609 aa. 


10..551 
1..609 


366/658 (55%) 
407/658(61%) 


e-170 
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095207 



Epsin 2a - Homo sapiens (Human), 584 l 1 ..55 1 



aa. 



1..584 



304/636(47%) je-l3{ 
366/636 (56%) j 



PFam analysis predicts that the NOV32a protein contains the domains shown in the 
Table 32 F. 



Table 32F. Domain Analysis of NOV32a 


Pfam Domain 


NOV32a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


ENTH 


17..I40 


69/131 (53%) 
118/131 (90%) 


1.4e-67 


UIM 


182.. 199 


12/18(67%) 
16/18(89%) 


0.014 



5 



Example 33. 

The NOV33 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 33A. 



Table 33A. NOV33 Sequence Analysis 




SEQIDNO: 135 '806 bp 




NOV33a, 

CGI 18981-01 DNA 
Sequence 


ATGACATGGCGGGCAGTGGCTGGCCGGGCCTCCTCATGTCCTGCCTGAAGGGTCCCCATGTCA 
TCCTCAAGATGGAGGCCATGAAGATTGTCCACCCTGAAAAGTTCCCTGAGCTACCGGCTGCCC 
CCTGCTTCCCGCCTGCTCCCCGGCCCACCCCAACTCTGGCACCCAAGCGTGCCTGGCCCTCAG 
ACACAGAGATCATTGTCAACCAGGCCTGTGGGGGGGACATGCCTGCCTTGGAAGGGGCACCCC 
ATACCCCGCCACTGCCACGGCGGCCCCGTAAGGGAAGCTCGGAGCTGGGCTTTCCCCGCGTGG 
CCCCAGAGGATGAGGTCATTGTGAATCAGTACGTGATTCGGCCTGGCCCCTCGGCCTCGGCGG 
CTTCTTCGGCGGCGGCAGGCGAGCCCCTGGAGTGCCCCACCTGTGGGCACTCCTACAATGTCA 
CCCAGCGGAGGCCCCGCGTGCTGTCCTGCCTGCACTCTGTGTGTGAGCAGTGCCTGCAGATTC 
TCTACGAGTCCTGCCCCAAGTACAAGTTCATCTCCTGCCCCACCTGCCGCCGTGAGACTGTGC 
TCTTCACCGACTACGGCCTGGCCGCGCTGGCTGTCAACACGTCCATCCTGAGCCGCCTGCCGC 
CTGAGGCGCTGACGGCCCCATCCGGGGGTCAGTGGGGGGCTGAGCCCGAGGGCAGCTGCTACC 
AGACCTTCCGGCAGTACTGTGGGGCCGCGTGCACCTGCCACGTGCGGAACCCACTGTCCGCCT 
GCTCCATCATGTAGTAGCGCCTGCCTGCCCGCCACTGCCCGCCATGTCAT 




ORF Start: ATG at 6 j 


ORF Stop: TAG at 768 




SEQIDNO: 136 ]254 aa 


MWat 27284.2RD 


NOV33a, 

CGI 18981-01 Protein 
Sequence 


MAGSGWPGLLMSCLKGPHVILKMEAMKIVHPEKFPELPAAPCFPPAPRPTPTLAPKRAWPSDT 
EIIVNQACGGDMPALEGAPHTPPLPRRPRKGSSELGFPRVAPEDEVIVNQYVI RPGPSASAAS 
SAAAGEPLECPTCGHSYNVTQRRPRVLSCLHSVCEQCLQILYESCPKYKFISCPTCRRETVLF 
TDYGLAALAVNTSILSRLPPEALTAPSGGQWGAEPEGSCYQTFRQYCGAACTCHVRNPLSACS 
IM 



10 

Further analysis of the NOV33a protein yielded the following properties shown in 
Table 33B. 
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Table 33B. Protein Sequence Properties NOV33a 


PSort 
analysis: 


0.8950 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space: 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV33a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 33C. 

5 



Table 33C Geneseq Results for NOV33a 


• Geneseq 
| Identifier 


! 

Protein/Organism/Length (Patent #, 
| Date] 


NOV33a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAU16313 


Human novel secreted protein, Seq ID 
1266 - Homo sapiens, 225 aa. 
[WO200155322-A2, 02-AUG-200I] 


30..254 
1..225 


221/225(98%) 
221/225 (98%) 


e-134 


ABB05662 

- 


Human signal transduction protein 
clone amy2 10M7- Homo sapiens, 
1 80 aa. [ WO200 1 98454-A2, 27-DEC- 
2001] 


75..254 
I..180 


180/180(100%) 
180/180(100%) 


e-106 


ABB91088 

! 


Herbicidally active polypeptide SEQ 
ID NO 299 - Arabidopsis thaliana, 
1579 aa. [WO200210210-A2, 07-FEB- 
2002] 


I32..208 
2..75 


31/77 (40%) 
38/77 (49%) 


3e-06 


AAU27735 


Mouse full-length polypeptide 
sequence #60 - Mus musculus, 771 aa. 
[WO200164834-A2, 07-SEP-2001] 


I34..239 
48..I63 


39/124(31%) 
59/124(47%) 


le-04 


ABG21275 


Novel human diagnostic protein 
#21 266 - Homo sapiens, 774 aa. 
[ WO200 1 75067- A2, 1 1 -OCT-200 1 ] 


134..239 
44.. 159 


39/124 (31%) 
59/124(47%) 


le-04 



In a BLAST search of public sequence datbases, the NOV33a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 33D. 



Table 33D. Public BLASTP Results for NOV33a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV33a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 
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|Q9H0X6 

I 


Hypothetical 19.4 kDa protein - 
Homo sapiens (Human), 1 80 aa. 


75. .254 ] 180/180(100%) 
I. .180 { 180/180(100%) 


e-106 


•AAH30073 

i 
i 
| 


Similar to hypothetical protein 
DKFZp76 1 H 1 7 1 0 - Mus muscu lus 
(Mouse), 1 83 aa. 


75. .254 
1..183 


169/183 (92%) 
172/183 (93%) 


3e-98 


j Q8QZS5 

i 

i 

i 


Similar to unknown (Protein for 
MGC:4734) - Mus musculus 
(Mouse), 190 aa. 


135..207 
12..84 


28/73 (38%) 
41/73 (55%) 


5e-08 


!q96D59 


Hypothetical 21 .7 kDa protein - 
Homo sapiens (Human), 192 aa. 


I35..207 
12..84 


27/73 (36%) 
41/73 (55%) 


le-07 


BAC03481 


CDNA FLJ33257 fis, clone 
ASTRO2005593 - Homo sapiens 
(Human), 247 aa. 


I20..207 
5..91 


36/89 (40%) 
49/89 (54%) 


2e-07 



PFam analysis predicts that the NOV33a protein contains the domains shown in the 
Table 33E. 



! ~~"™~ — 

i Table 33E. Domain Analysis of NOV33a 



1 

Pfam Domain 


NOV33a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


zf-C3HC4 


136.. 182 


15/55 (27%) 
32/55 (58%) 


0.025 



5 

Example 34. 

The NOV34 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 34A. 



|Tablc 34A, NOV34 Sequence Analysis 


i 


SEQIDNO: 137 1955 bp 


*NOV34a, 

jCGI 19385-01 DNA 
'Sequence 

i 

: 

! 

i 

i 
i 

I 


GGAAGGTGTGAGGTTGCCGCCATGCCTGGCAGAACGGAGGGAGGCAGTTGGCTCCGGAATGCG 
GCCGCCGCAGATGTTCTCCGCAACCTTCCGGAAGTGGAATGGCGGGAGCCTCAGCATTGCTGC 
CCACCGACCCCCCGGAAGCGGAAACAGAATCCCCGCGTGCCCCTTCCTCACTACCCTCCAAAT 
CCCGC TG C AG CCATTGC CG C AG AC ACG ATGCCG AAACG AAAG AAGC AG AATC AT C ACC AG C C A 
CCGACACAGCAGCAGCCCCCGCTGCCCGAGCGGGAAGAGACTGGAGATGAGGAGGATGGGAGT 
CCCATCGCTCTTCACAGAGGTCCTCCAGGATCAAGGGGACCACTGATTCCACCACTGCTGAGT 
CTCCCACCTCCTCCTTGGGGTAGAGGCCCAATTCGGAGAGGGCTTGGCCCCAGGTCTAGCCCA 
TATGGTCGTGGTTGGTGGGGAGTCAATGCAGAACCTCCTTTTCCGGGGCCAGGCCATGGGGGT 
CCCACCAGGGGAAGCTTTCACAAGGAACAGAGAAACCCTCGAAGGCTCAAAAGCTGGTCTCTT 
ATCAAGAATACCTGCCCGCCCAAGGATGACCCCCAGGTTATGGAAGACAAATCCGACCGCCCT 
GTCTGCCGACATTTTGCCAAAAAGGGCCACTGTCGATATGAGGACCTCTGTGCCTTCTACCAT 
CCAGGCGTCAATGGACCTCCTCTGTGAGACTGTGCCTTCCCATCCAGGCTGGAAGGAGCTCTC 


TGTGACCTAGCGGCCATTTATTTCTCTGTAGCCCTATGATGGCTACTGTGAGGCTCTTCTAAC 


ACCCTCAGTCAGTGACACACCCATCCCATCCACCACTTCCCCCGTGTGGGGTCCAGAGTGGTG 


TTGCATCACTGGTGCGCGGCATACGCGCTTTCTTCTGATCCAGCCTGTAGAGACTCGCCTTTG 


GGACCCATCTTTGCTTCCTTTCAGTTGCCTCCTGGATCTTCTTTCCCGTCATCAAATGACTGC 
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/ 



1 
1 

i 

i 

i 
i 
j 

j 
1 

) 

1 

! 

i 

i 


TGAACAGGAAACCTCTTTGGTGCTGTTTCTTGTGCATCTGTCCACCTGTTCCCCAGTATTGCC 


CTCAATTCCTGAGAGCCCTGGAGCGGTTTCCTACCATTCCCTTCTTTTAGCTGCTTGTTTTAA 


GTCCTTTTTATGTGACATTCCCTACCCCCAATGTTGTCAGCTGCTTGTGAAACTCAGCCAGGT 


TGTCTAACCTGGGGTCAAGTTTGGGTGACTGGTGCAGAGTTACTTCCTAAAAGGCCACTCTCC 




GATCACCGTGCTTTGTGAAACTGTGCATCCCCTTGTAGCCTTTCTCAGTGTCCGTGGCATTTT 


TGTGACTTCCCAGCACTAGAATAAGTTTTCCTGCCAAAATGAGTGAGGCGCTTGGTGCCCTCT 


GGACTTTCCCACTTCCCAACATGGGAGAATTGTGAACTTTCCATCAGACTGCCTCCCTGGCCC 


TCCCCATTCTTCTCCTGTTGGTTATTCTGAGTCTGACACAGACCCATGACATGTCTTATAAAG 


CCTCCAATGGCTTTATCCTACCTAGATCCCTTCCAGCCCATTTTAATTAGACTATGTCATTGT 


GAGGCCACCAG1 L.V.AI iLAl 1 KjAAJ. J. C 1 vj I (j AA 1 \_ iLLALLl ivjv-t-1 Al LT1 IGGGTAvjAAG 


CTGG ACAGT ACTGTTGCCCTCTT CC AATC CT CTT C C C CT AC ATC CCTGG C ACT GG TTG TTTT C 


TGTGAAAACAGCAGTGaACAGGTTCAGTTTTGAACTGGCCCTGAGGAAATGGGTCAGGAGTTG 


TATTGGCAAGAGGGAGGGGTGAGAGCTGTTGGAGAACTGAGAATGAGGTTTTTTTTTTTTTTT 


TCTTTTTAACTTTTTTTATATTAGTAATAAATGCAGTGGAAACCAGCATTTTATTTAAAAAAA 


AA 


! 

; 
1 


ORF Start: ATG at 22 j ORF Stop: TG A at 718 


i ■ 


SEQ1DN0:138 |232aa MW at 25830. IkD 


[NOV34a, 

iCG 119385-01 Protein 
[Sequence 


MPGRTEGGSWLRNAAAADVLRNLPEVEWREPQHCCPPTPRKRKQNPRVPLPHYPPNPAAAIAA 
DTMPKRKKQNHHQPPTQQQPPLPEREETGDEEDGSPIALHRGPPGSRGPLIPPLLSLPPPPWG 
RGPIRRGLGPRSSPYGRGWWGVNAEPPFPGPGHGGPTRGSFHKEQRNPRRLKSWSLIKNTCPP 
KDDPQVMEDKSDRPVCRHFAKKGHCRYEDLCAFYHPGVNGPPL 



Further analysis of the NOV34a protein yielded the following properties shown in 
Table 34B. 



Table 34B. Protein Sequence Properties NOV34a 

j PSort \ 0.7000 probability located in nucleus; 0.2531 probability located in lysosome 
analysis: j (lumen); 0.1000 probability located in mitochondrial matrix space; 0.0000 
| probability located in endoplasmic reticulum (membrane) 

SignalP I No Known Signal Sequence Predicted 

analysis: j 



A search of the NOV34a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 34C. 

i : 1 

Table 34C. Geneseq Results for NOV34a 



r 

! Gcncscq 
Identifier 


Protcin/Organism/Length [Patent #, 
Date] 


NOV34a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


IAAM93213 

1 


Human polypeptide, SEQ ID NO: 
2614 - Homo sapiens, 167 aa. 
[EP1 130094-A2, 05-SEP-2001] 


66..232 
1-167 


166/167 (99%) 
167/167 (99%) 


e-105 


AAU28194 


Novel human secretory protein, Seq 
ID No 363 - Homo sapiens, 940 aa. 
[WO200I66689-A2, 13-SEP-2001] 


B6..232 
799..939 


55/167 (32%) 
63/167 (36%) 


8c- 13 
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1 

i AAU28382 


Novel human secretory protein, Seq 
ID No 739 - Homo sapiens, 968 aa. 
[WO200166689-A2, 13-SEP-2001] 


S6..232 
S26..967 


56/171 (32%) 
61/171 (34%) 


7e-12 


AAM39I41 


Human polypeptide SEQ ID NO 2286 
- Homo sapiens, 707 aa. 
[WO2001533I2-AI, 26-JUL-200 1 ] 


36.. 163 
46.. 176 


36/131 (27%) 
48/131 (36%) 


3e-05 


AAM23916 


Human EST encoded protein SEQ ID 
NO: 1441 - Homo sapiens, 1690 aa. 
[WO2001 54477- A2, 02-AUG-2001] 


36..163 
1222..1357 


49/140 (35%) 
55/140 (39%) 


8e-05 



In a BLAST search of public sequence datbases, the NOV34a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 34D. 



Tabic 34D. Public BLASTP Results for NOV34a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV34a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


1 
| 

Expect 
Value 


P79522 


MHC class 1 region proline rich 
protein - Homo sapiens (Human), 
253 aa. 


1..232 
I..253 


232/253 (91%) 
232/253 (91%) 


e-145 


Q96QB9 


CAT56 protein - Homo sapiens 
(Human), 188 aa. 


66..232 
I ..188 


167/188 (88%) 
167/188(88%) 


e-101 


Q91YI5 

i 


Hypothetical 16.0 kDa protein - 
Mus musculus (Mouse), 143 aa. 


101. .232 
12. .143 


113/132(85%) 
121/132(91%) 


7e-69 


055000 


Hypothetical 92.8 kDa protein - 
Rattus norvegicus (Rat), 872 aa. 


50..232 
640..871 


78/245 (31%) 
94/245 (37%) 


7e-16 


AAH29765 


Hypothetical protein - Mus 
musculus (Mouse), 623 aa 
(fragment). 


50..232 
375..622 


78/261 (29%) 
93/261 (34%) 


4e-14 



5 

PFam analysis predicts that the NOV34a protein contains the domains shown in the 
Table 34E. 



f ■ ~ ""■ r '— ■ — ' ■ ■ " — • 

Table 34E. Domain Analysis of NOV34a 










Identities/ 




Pfam Domain 


NOV34a Match Region 


Similarities 


Expect Value 


i 
i 




for the Matched Region 


zf-CCCH 


200..226 


10/27 (37%) 


0.0031 






20/27 (74%) 
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Example 35. 

The NOV35 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 35A. 



jTab le 35 A. NOV3 5 Sequence A nalysis 





SEQ ID NO: 139 


20566 bp 


NOV35a, 

CGI 19566-01 DNA 
Sequence 


CCACTACTACTCTGAAAAATGGCAGATGACGAAGACTATGAGGAGGTGGTGGAGTACTACACA 
GAAGAAGTGGTTTACGAAGAGGTGCCGGGAGAGACAATAACAAAAATTTATGAGACTACGACA 
ACAAGGACATCTGACTATGAGCAATCAGAAACTTCCAAACCAGCTCTGGCACAGCCAGCACTG 
GCACAGCCAGCATCAGCAAAGCCGGTGGAGAGGAGGAAGGTCATCCGGAAGAAAGTGGATCCT 
TCAAAGTTCATGACCCCCTACATTGCACACAGTCAGAAAATGCAGGATCTTTTTAGCCCAAAT 
AAAT AC AAGG AG AAGTTTG AG AAAAC AAAAGGAC AG CC AT ACG CC AG C AC AAC AG AT AC T C C A 
GAACTTCGCAGAATCAAAAAAGTACAAGATCAACTCAGTGAGGTTAAGTATCGAATGGATGGT 
GATGTTGCTAAGACTATATGTCACGTAGATGAAAAAGCAAAGGATATTGAACATGCAAAGAAA 
GTGTCGCAGCAAGTCAGTAAGGTTTTATACAAGCAGAACTGGGAAGACACCAAGGATAAGTAC 
CTGCTTCCTCCTGATGCCCCTGAACTTGTCCAGGCCGTTAAGAACACCGCCATGTTCAGCAAG 

GAACTGAGGAGAGTTGCCCAGGCCCAGAAAGCTCTCAGTGATGTTGCCTACAAAAAAGGTCTC 
GCTG AAC AGC AAG CTC AATT C ACG CC TCTGG CCG ATCCTCC AG AT AT AG AAT T TG CC AAG AAA 
GTAACCAATCAAGTGAGCAAGCAAAAATACAAAGAAGACTATGAAAATAAAATCAAAGGCAAA 
TGGAGTGAGACACCTTGCTTTGAAGTTGCAAATGCCAGAATGAATGCTGATAACATTAGCACA 
AGG AAAT ACCAGG AAG ATTTTGAAAAC ATG AAAG AC C AG ATCT ACTT C ATG C AG A C CG AAAC A 
CCAGAGTATAAAATGAATAAAAAAGCTGGTGTGGCAGCTAGCAAGGTAAAATACAAAGAAGAC 
TATGAAAAGAATAAAGGAAAAGCAGATTATAATGTGCTTCCTGCTTCAGAGAACCCACAGCTT 
AGGCAGCTGAAGGCAGCAGGAGATGCCCTAAGTGACAAACTATACAAGGAAAACTATGAAAAG 
AC AAAAG C AAAG AGC AT AAATT ACTG CG AGAC C C C C AAATT C AAG CT CG AT A CTGTTCTGCAG 
AACTTC AGT AGTG AT AAAAAATAT AAAG ATTCCT AC TT AAAAG AT AT TTTGG G AC AT T ATG T A 
GGCAGCTTCGAGGATCCATACCATTCACACTGCATGAAAGTCACAGCTCAAAACAGTGATAAA 
AACTACAAAGCAGAATACGAAGAAGACAGAGGCAAAGGCTTCTTCCCTCAGACCATAACTCAA 
GAATATGAAGCAATTAAGAAACTAGATCAGTGTAAAGACCACACCTACAAAGTCCATCCAGAT 
AAGACAAAATTCACCCAAGTTACAGACTCTCCTGTTCTGCTACAAGC CCAAGTC AATT CC AAA 
CAACTGAGTGACTTAAATTACAAAGCAAAACATGAAAGTGAAAAGTTCAAGTGCCATATCCCC 
CCTGATACTCCTGCTTTTATCCAGCACAAAGTCAATGCCTATAACTTGAGTGATAATCTTTAT 
AAGCAAGACTGGGAGAAGAGCAAAGCCAAAAAGTTTGACATTAAAGTGGATGCCATTCCCCTG 
CTGG C AG CC AAAG C C AAC ACC AAG AACAC C AGCG ATG TG ATGT AC AAG AAAG A CT ATG AAAAA 
AACAAAGGGAAAATGATTGGAGTCCTCAGCATTAATGACGATCCCAAGATGCTGCACTCCTTG 
AAGGTGGCCAAAAACCAGAGTGATAGATTATACAAGGAAAACTATGAGAAGACAAAGGCAAAG 
AGTATGAATTACTGTGAGACCCCAAAATATCAACTTGATACTCAGCTGAAGAACTTCAGTGAG 
G CT AG AT ATAAAG ACTT ATATGT AAAGG ATGTTTTGGG AC AT T ATGT AGG C AG C ATGG AG G A C 
CCATATCACACACACTGCATGAAAGT,TGCAGCTCAAAACAGTGATAAAAGTTACAAAGCAGAA 
TATGAAGAAGATAAAGGAAAATGCTATTTCCCTCAGACAATAACACAAGAATATGACGCAATC 
AAG AAG CTGG ACC AGTGT AAAG ATC AT AC C T ACAAAGTTC AT CC AG AT AAG AC C AAAT T C ACG 
GCAGTCACTGATTCTCCTGTACTGTTGCAAGCCCAGCTCAACACGAAACAGCTTAGTGATCTG 
AATTACAAAGCAAAACATGAAGGTGAGAGGTTCAAGTGCCATATACCAGCAGATGCTCCACAG 
TTTATCCAACACAGAGTCAATGCCTATAATCTGAGTGATAATGTTTATAAGCAAGACTGGGAG 
AAGAGCAAAGCCAAGAAGTTTGACATTAAAGTGGACGCCATTCCCCTGTTGGCAGCCAAAGCC 
AACACCAAGAACACCAGCGATGTGATGTACAAGAAAGACTATGAAAAGAGCAAAGGGAAAATG 
ATTGGAGCCCTCAGCATTAATGACGATCCAAAGATGCTGCACTCCTTGAAGACAGCCAAAAAC 
CAGAGTGATCGCGAATATCGAAAAGATTATGAAAAGTCAAAAACTATCTACACGGCACCTCTT 
GATATGCTCCAAGTCACTCAAGCTAAGAAATCTCAGGCAATTGCCAGCGACGTTGATTATAAG 
CACATCTTACACAGTTACAGCTACCCCCCTGATAGCATCAATGTGGACCTTGCCAAGAAGGCA 
TATGCGCTGCAGAGCGATGTTGAATACAAAGCTGACTACAATAGCTGGATGAAAGGTTGTGGC 
TGGGTGCCTTTTGGGTCCTTAGAAATGGAAAAGGCAAAGCGAGCTTCAGACATCCTCAATGAG 
AAAAAATATCGCCAACATCCAGACACCCTCAAGTTTACCTCGATTGAAGATGCTCCAATTACA 
GTACAGTCTAAAATTAACCAGGCCCAGAGGAGTGATATCGCTTACAAAGCCAAAGGAGAGGAA 
ATTATTCACAATTACAACCTGCCACCAGACCTGCCCCAGTTCATCCAGGCTAAAGTTAATGCC 
T AC AAT ATC AGTG AG AAT ATGT AC AAAGC AG ACT TG AAAG AC TTG AGC AAG AAGG GAT ATG AC 
CTGAGAACTGATGCGATTCCCATCAGAGCTGCCAAAGCTGCCAGGCAGGCGGCGAGTGACGTT 
CAGTACAAAAAAGACTATGAAAAGGCTAAAGGGAAAATGGTTGGCTTCCAAAGTCTTCAAGAT 
GACCCTAAACTGGTTCATTATATGAACGTGGCCAAGATACAATCAGATCGGGAGTATAAAAAA 
GACTATGAGAAGACAAAGTCCAAATACAACACGCCCCATGATATGTTCAATGTCGTGGCGGCT 
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AAGAAAGCCCAGGATGTGGTCAGCAATGTCAACTATAAGCATTCTCTCCATCATTACACCTAC 
TTGCCTGACGCCATGGACCTGGAGCTGTCTAAGAACATGATGCAGATACAGAGTGATAACGTC 

t ac aagg aagact ac aac aac tgg at g aaag gc at tgg ctgg at t cc t attgg c agt c t cg ac 
gtcgaaaaagttaaaaaggccggtgatgctctgaatgaaaagaagtacaggcaacatccagac 
accctcaaatttaccagcattgtggactccccagttatggtccaggcaaaacagaacacgaag 
caagtcagtgatatcttatacaaggctaaaggagaagatgtgaaacataaatacaccatgagt 
cctgatcttcctcagtttctccaggccaagtgcaatgcttacagtataagtgacgtctgttat 
aaacggg attgg c atg actt aat acg caagggc aac aatgtgctgggcgatgct at tcc catc 
actgcagccaaggcatcgagaaacattgccagtgattataaatacaaggaagcttatgagaag 
tcaaagggaaagcatgtgggtttcagaagcctccaggatgatcccaagctggtccactatatg 
aatgtggcaaagctgcagtctgatcgtgaatacaagaagaactatgagaacaccaaaaccagc 
taccatacccctggggacatggttacgatcacagctgcaaagatggcccaggatgtcgctacc 
aatgtcaactacaaacagccattgcatcattacacatacctacctgacgccatgagtcttgag 
catacgaggaatgtcaatcaaat tc ag agtg at aatgtgt at aaagacg agt at aac agct t c 
ttgaagggcatcggatggatccctattggttccctggaggtggagaaggtcaagaaagcaggc 
gatgcattaaatgagaggaagtatcgacagcacccagataccgtcaagttcacaagtgtgcct 
gattccatgggcatgatgttggctcagcataacacaaagcagctaagtgatttgaactacaag 
gtagagggagagaaactgaagcacaagtatactattgaccctgaattgcctcagtttattcaa 
gccaaagtcaacgccctcaacatgagtgatgctcattataaagcagattggaagaaaaccatt 
cgcaagggctatgatttgagaccagatgccatcccaattgttgctgcaaaaagttcaaggaat 
attgctagtgattgcaaatataaggaggcctacgagaaagccaaaggcaagcaagttGgattt 

CTCAGTCTTCAGGATGATCCTAAACTGGTTCACTACATGAATGTGGCCAAAATCCAGTCTGAT 

CGTGAGTACAAAAAGGGCTATGAAGCCAGCAAGACCAAGTACCACACACCTCTGGATATGGTC 

AGTGTGACAGCTGCAAAGAAATCTCAGGAGGTTGCCACCAACGCCAACTACAGACAGTCATAC 

CACCACTACACTCTCCTGCCCGATGCCTTGAATGTGGAGCACTCCAGGAATGCCATGCAGATT 

CAGAGTGATAATCTGTACAAATCTGACTTCACCAATTGGATGAAAGGGATCGGCTGGGTGCCC 

ATAGAGTCCCTGGAGGTGGAGAAGGCAAAGAAAGCAGGAGAGATTCTTAGTGAGAAGAAGTAT 

CGCCAGCACCCCGAGAAGCTGAAGTTCACTTACGCCATGGACACAATGGAACAGGCACTTAAC 

AAGAGTAACAAACTGAACATGGACAAGAGGCTCTACACTGAAAAATGGAACAAGGACAAGACC 

ACCATTCATGTCATGCCTGACACACCGGATATTTTACTCTCCAGAGTAAACCAAATCACCATG 

AGTGATAAACTGTACAAAGCTGGCTGGGAAGAGGAAAAGAAGAAAGGATATGACCTGAGGCCT 

GATGCCATTGCAATAAAGGCTGCAAGAGCCTCTAGAGACATTGCCAGTGATTACAAATACAAG 

AAAGCCT ATG AAC AAG CC AAAGGG AAAC AC ATTGGCT TC CGG AG C CTGG AAG ATG ACCCCAAG 

CTGGTG C ACT TCATG C AAGTGGC C AAG ATG C AGTC AG ACCGGG AATACAAG AAGGG AT ATG AG 

AAATCCAAGACCTCCTTCCACACCCCGGTGGACATGCTCAGTGTGGTGGCAGCCAAGAAGTCT 

C AGG AAGTGG CC ACC AATGCC AACT A C AG G AACGTG ATC C AT ACCT AC AAC AT G C TT C CTG AT 

GCCATGAGCTTTGAATTGGCCAAAAATATGATGCAGATTCAAAGTGATAATCAGTACAAGGCT 

GACTATGCTGACTTCATGAAGGGCATTGGATGGCTCCCTCTGGGCTCCCTGGAAGCAGAGAAA 

AACAAGAAAGCCATGGAGATTATTAGTGAAAAGAAGTACCGCCAGCACCCAGACACTTTGAAG 

TATTCCACACTCATGGACTCGATGAACATGGTTTTGGCCCAGAATAATGCAAAAATTATGAAC 

G AAC ATCTCT AC AAAC AAGC ATG GG AGG C TG AC AAAACC AAAGTCC AC ATC AT G CCTGATATC 

CCCCAGATTATTTTGGCAAAGGCAAATGCAATTAATATAAGTGATAAACTCTACAAACTTTCC 

TTGGAAG AGT CT AAAAAG AAAGG CT ATG AT CTC AG A CCTG ATGCAAT TCCT AT C AAAG C TG CC 

AAGG CTT CCAGAG AT ATTGC AAGTGATT AT AAAT AC AAGTAC AATT ATG AAAAAGGGAAGG GG 

AAAATGGTTGGTTTCCGCAGTCTCGAGGATGATCCCAAATTAGTCCATTCCATGCAAGTGGCT 

AAGATGCAATCTGATCGGGAGTACAAGAAAAACTATGAGAACACAAAGACCAGCTACCACACC 

CCTGCCGACATGCTCAGTGTCACGGCTGCAAAGGATGCCCAAGCCAACATCACCAACACTAAC 

TACAAGCACCTGATTCACAAGTACATCCTCCTTCCAGATGCAATGAACATTGAGCTGACCAGG 

AATATGAATCGCATACAGAGTGATAATGAATATAAGCAAGATTACAATGAATGGTACAAAGGG 

CTTGGCTGGAGTCCAGCAGGTTCTCTGGAAGTGGAGAAGGCCAAGAAAGCAACTGAATATGCC 

AGTGATCAGAAATACCGCCAGCACCCGAGCAACTTCCAGTTTAAGAAGCTGACTGATTCCATG 

GACATGGTGCTTGCCAAGCAGAATGCACATACCATGAACAAGCATTTATACACCATTGATTGG 

AATAAAGATAAGACCAAGATTCATGTGATGCCTGATACACCAGATATTTTACAAGCCAAGCAG 

AATCAAACACTGTATAGTCAGAAACTCTATAAACTTGGATGGGAAGAAGCTTTGAAGAAAGGC 

T ATG ATCTCCCAGTTG ATG C AATTTCTGT AC AGCT AG CT AAAG CTTC AAG AG A C ATTG C T A GT 

GATTAT AAAT AC AAACAAGG CTACCG AAAG CAACTTGG C CAC C ATGTTGG ATT CCGG AGTCTG 

CAAGATGACCCAAAACTTGTGTTGTCCATGAATGTAGCCAAAATGCAGAGTGAAAGAGAATAC 

AAGAAGGACTTTGAGAAGTGGAAAACTAAGTTCTCCAGCCCAGTGGACATGTTGGGAGTGGTA 

CTGGCCAAGAAGTGTCAGGAGTTGGTTAGTGACGTGGACTACAAGAACTACCTGCATCAGTGG 

ACATGTCTGCCTGATCAGAACGATGTTGTGCAAGCTAAGAAAGTTTATGAACTGCAAAGTGAG 

AATCTATATAAATCTGACCTTGAGTGGCTGAGAGGCATAGGATGGAGTCCCTTGGGTTCTTTA 

GAGGCAGAAAAGAACAAGCGGGCTTCGGAAATCATCAGTGAGAAGAAATATCGTCAGCCTCCA 

GACAGAAACAAGTTCACCAGCATTCCTGATGCCATGGATATAGTTCTGGCAAAGACAAATGCC 

AAAAATAGGAGTGATAGACTTTATAGAGAAGCTTGGGACAAAGACAAGACTCAGATCCACATC 

ATGCCTGATACACCTGACATTGTTCTGGCTAAAGCAAACTTAATCAACACAAGTGATAAACTC 

TACCGAATGGGTTATGAGGAGCTGAAGAGAAAAGGTTACGATCTTCCTGTTGATGCCATACCA 
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ATCAAAGCAGCAAAAGCCTCCCGGGAAATTGCCAGTGAATACAAGTACAAGGAAGGCTTTCGC 

AAGCAGCTCGGCCACCACATTGGTGCCCGGAACATTGAAGATGACCCCAAGATGATGTGGTCC 

ATGCATGTGGCCAAGATCCAGAGTGACAGGGAGTACAAGAAGGACTTTGAGAAGTGGAAGACC 

AAGTTCAGCAGCCCAGTGGACATGCTGGGGGTGGTGTTGGCCTATAAGTGCCAGACCTTAGTC 

AGCGACGTGGACTACAAGAACTACCTGCACCAGTGGACATGCCTGCCCGACCAGAGCGATGTC 

ATCCATGCTC GGC AGG CCTAT G ACCT CC AG AG CG AT AATTTG T AC AAGTC AG A C C TTC AGTGG 

CTAAAAGGCATTGGCTGGATGACTAGTGGTTCTCTCGAGGATGAGAAAAATAAACGAGCCACC 

CAGATTTTGAGTGACCATGTTTACCGTCAGCACCCAGATCAATTTAAGTTTTCCAGCCTTATG 

GATTCCATACCAATGGTTTTGGCAAAAAACAATGCTATTACCATGAATCATCGCCTCTATACA 

GAAGCTTGGGATAAAGATAAAACCACTGTCCACATTATGCCAGATACCCCTGAAGTTTTATTA 

GCT AAAC AAAACAAAGT AAAT T ACAGTG AG AAAT TG TAT AAG CTTGG CCT AG AAG AAG CC AAG 

AGGAAAGGTTATGACATGCGGGTAGATGCCATTCCTATCAAGGCAGCCAAGGCCTCCAGAGAT 

ATTGCAAGTGAATTCAAGTACAAAGAAGGCTATCGTAAGCAGCTCGGCCACCACATTGGTGCC 

CG AGC T AT ACGTG ATG ACCCC AAG ATG ATGTGGT CC ATG C ACG TGGC C AAG AT C C AG AG TG AC 

AGGGAGTACAAGAAGGACTTTGAGAAGTGGAAGACCAAGTTCAGCAGCCCAGTGGACATGCTG 

GGGGTGG TGCTGG CC AAG AAGTG CC AG ACCTT AG TC AGCG ATGTGG ACT AC AAG AA CT ACCTG 

C ACC AGTGG ACATGCCTGC CCG ACC AG AG CG ACG TC AT CC ATGCT CG GC AGG C CT ATG ACCT C 

CAGAGCGATAATATGTACAAGTCTGATCTCCAGTGGATGAGAGGCATTGGCTGGGTGTCCATT 

GGCT CTTTGG ATGTGG AAAAATG C AAAAGGGCAACTG AAATTTTG AG TG AT AAAAT CT ATCG C 

CAGCCTCCAGACAGATTCAAATTTACCAGTGTGACTGACTCTCTGGAACAAGTGCTGGCCAAG 

AACAATGCTCTCAACATGAATAAGCGTTTATACACAGAGGCCTGGGACAAAGACAAGACTCAA 

ATTCACATAATGCCTGATACACCAGAGATTATGTTGGCAAGGCAGAACAAAATCAACTACAGT 

GAGACTCTATACAAACTTGCCAATGAAGAAGCAAAAAAGAAAGGCTACGACTTGCGAAGTGAC 

GCC ATCCC C ATCGTGG CTGCC AAGGCCT C C AGGG ACGTTATCAG TG ATT AC AAAT AC AAAG AT 

GGTTACCGCAAGCAGCTCGGCCACCACATTGGAGCCCGGAACATTGAAGATGACCCCAAGATG 

ATGTGGTCCATGCATGTGGCCAAGATCCAGAGTGACAGGGAGTATAAGAAGGACTTTGAGAAG 

TGGAAGACCAAGTTCAGCAGCCCAGTGGACATGCTGGGAGTGGTGTTAGCCAAGAAGTGCCAG 

ACCTTAGTCAGCGATGTGGACTACAAGAACTACCTGCACGAGTGGACGTGCCTGCCCGACCAG 

AATGATGTCATCCATGCTCGGCAGGCCTATGACCTCCAGAGCGATAACATTTACAAATCTGAT 

CTCCAGTGGCTGAGAGGCATTGGCTGGGTCCCCATTGGGTCTATGGATGTGGT C AAGTG C AAG 

AGAGCTGCTGAAATACTGAGTGATAACATCTACCGCCAGCCTCCGGACAAGCTGAAATTTACC 

AGTGTGACTGACTCTCTAGAGCAGGTGCTGGCCAAGAACAATGCTCTCAATATGAACAAGCGC 

TTATACACAGAAGCCTGGGACAAAGACAAGACCCAAGTCCATATTATGCCTGATACACCTGAA 

ATCATGTTGGCAAGACAAAATAAAATAAATTATAGTGAGAGCCTCTATCGTCAGGCCATGGAA 

GAAGCCAAGAAAGAAGGCTATGACTTGAGAAGTGATGCCATTCCCATTGTGGCTGCCAAGGCC 

TCTCGGGATATTGCCAGTGATTACAAATACAAAGAAGCATATCGTAAGCAGTTGGGTCACCAC 

ATTGGCGCCCGAGCAGTACACGATGACCCCAAGATAATGTGGTCCCTCCACATTGCCAAAGTG 

CAGAGTGACCGTGAGTACAAGAAAGATTTTGAGAAATACAAGACAAGGTACAGCAGCCCAGTG 

GACATGCTTGGTATCGTTTTGGCCAAGAAGTGTCAGACCTTGGTCAGCGATGTGGACTATAAA 

CATCCTCTGCATGAATGCATCTGCCTGCCCGACCAGAATGACATCATTCATGCACGGAAAGCC 

TATGACCTCCAGAGTGACAATTTGTATAAGTCAGACCTTGAATGGATGAAAGGCATTGGCTGG 

GTTCCGATTGATTCCTTGGAAGTTGTTAGGGCCAAGAGAGCTGGAGAATTACTTAGTGATACT 

ATCTACCGTCAGCGTCCAGAAACGCTGAAATTTACCAGTATAACGGACACTCCGGAGCAGGTG j 

CTGGCAAAAAACAATGCTTTAAACATGAATAAGCGCTTATATACTGAAGCCTGGGACAATGAC 

AAG AAAACT ATTC ATGTC ATG CCTG AT AC AC C AG AAATC ATGTT AG C C AAACT C AACCG AAT A 

AACT AC AGTG AT AAAC TC TAT AAACTTGCTT TGG AAG AGT CC AAG AAG G AAG G CT ATG ACT TG 

CGTCTGGATGCCATTCCAATCCAAGCAGCCAAGGCTTCAAGAGATATTGCTAGTGATTACAAG 

TACAAGGAAGGCTACCGCAAACAGCTTGGCCACCATATTGGGGCCCGGAACATTAAGGATGAC 

CCGAAGATGATGTGGTCCATCCATGTGGCCAAGATCCAGAGTGACAGGGAGTACAAGAAGGAG 

TTTGAGAAGTGGAAGACCAAGTTCAGCAGCCCAGTGGACATGCTGGGGGTGGTGCTGGCCAAG 

AAGTGTCAGATCCTTGTAAGCGACATAGACTACAAGCATCCCCTGCATGAATGGACCTGCCTG 

CCTGATCAGAATGACGTCATTCAGGCTCGGAAGGCCTATGACCTGCAGAGTGATGCTATTTAC 

AAATCTGATCTTGAGTGGCTGAGAGGCATAGGATGGGTTCCCATTGGCTCTGTAGAGGTCGAG 

AAAGTGAAGAGAGCTGGAGAAATCCTGAGTGACAGGAAGTATCGCCAGCCTGCAGACCAGCTC 

AAATTCACATGCATTACCGACACTCCGGAAATTGTCCTAGCAAAGAATAATGCCCTGACAATG 

AGCAAGCATTTATACACAGAAGCTTGGGATGCTGACAAAACCTCCATCCACGTGATGCCAGAC 

ACCCCAGATATCCTGCTGGCCAAGAGTAATTCTGCCAATATCAGCCAAAAACTTTACACCAAG 

GGATGGGATGAATCAAAGATGAAGGACTATGATCTGAGAGCAGATGCTATTTCCATCAAAAGT 

GCCAAGGCCTCCAGGGACATCGCCAGTGACTACAAATACAAGGAAGCCTATGAGAAACAGAAA 

GG CC ACC AC AT TGG AGCC CAG AGC AT TG AAG ATG ATCCC AAG ATT ATGTGTG C C AT AC ATG C A 

GAAAAAATTCAAAGTGAAAGGGAGTACAAGAAGGAATTCCAAAAGTGGAAAACCAAGTTCTCT 

AGCCCAGTGGACATGTTAAGCATCTTGCTGGCCAAGAAATGTCAGACTTTGGTCACTGACATT 

TATTATCGCAATTACCTGCATGAATGGACATGCATGCCGGATCAAAACGACATTATCCAAGCA 

AAAAAGG CCT ATG AC CTG CAG AGTG ATG CC CTCT ACAAGG CTG ACTTGG AG TGGT TG CGTG GC 

ATTGGCTGGATGCCCCAAGGGTCTCCTGAAGTGTTGAGAGTCAAAAACGCCCAGAATATCTTT 

TGTGACAGTGTCTATCGGACGCCTGTGGTGAACCTTAAGTACACAAGCATTGTTGACACACCT 
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GAAGTGGTCCTTGCTAAATCAAATGCTGAAAATATTAGTATTCCAAAGTACAGAGAGGTTTGG 

GACAAGGATAAAACTTCAATACACATAATGCCAGATACTCCAGAAATTAATCTCGCTAGAGCA 

AATGCTCTTAATGTGAGCAATAAACTTTACCGTGAGGGCTGGGATGAAATGAAGGCGGGCTGf 

GATGTCCGGCTGGATGCCATCCCCATCCAGGCTGCCAAGGCCTCCAGGGAGATTGCCAGTGAC 

TATAAATATAAGCTTGACCATGAGAAGCAGAAGGGACACTACGTGGGCACCCTCACAGCCAGG 

GATG AC AAC AAG ATCCGC TGG G C CC T C AT AGCTG A C AAG CTC C AG AATG AACG AG AG T ACCG G 

CTGGACTGGGCCAAATGGAAGGCCAAGATCCAGAGCCCTGTGGACATGCTTTCCATCCTGCAC 

TCT AAAAATT CCC AGGCT CTGGTCAG TG AC ATGG ATT ACCGC AATT ACCTG C ACC AGTGG AC C 

TGCATGCCCGACCAGAACGATGTGATTCAGGCCAAGAAGGCCTACGAACTGCAGAGCGATAAT 

GTTTACAAGGCTGACTTGGAATGGTTGCGTGGAATTGGGTGGATGCCAAATGACTCCGTGTCC 

GTCAATCATGCCAAACATGCCGCGGACATCTTCAGTGAGAAAAAATATCGCACAAAAATAGAA 

ACTCTCAACTTTACGCCTGTGGATGACAGAGTTGATTATGTGACAGCGAAACAAAGTGGCGAG 

ATCCTCGATGATATTAAATACCGGAAAGACTGGAATGCCACCAAATCAAAGTACACCCTCACA 

GAAACCCCCCTGCTGCACACTGCCCAGGAGGCTGCTAGGATACTGGACCAGTATCTCTACAAG 

GAAGGCTGGGAGAGACAAAAAGCCACAGGTTACATTTTGCCTCCAGATGCTGTGCCATTTGTT 

CATGCCCATCACTGCAATGACGTTCAGAGTGAGCTGAAATACAAAGCTGAACATGTGAAGCAA 

AAAGGTCATTATGTTGGTGTCCCGACGATGAGAGATGATCCTAAGCTGGTTTGGTTTGAGCAT 

GCAGG CC AG ATTCAGAATGAG AG ACT AT AC AAAG AGG ACT ATC AC AAAACAAAGG CC AAAAT C 

AATATACCTGCTGATATGGTGTCAGTCTTGGCCGCCAAGCAGGGGCAGACCCTTGTCAGTGAT 

ATTGATTATCGTAATTACTTGCACCAATGGATGTGTCATCCTGACCAGAACGATGTTATTCAG 

GCAAGAAAGGCCTATGACCTACAGAGTGATAATGTCTACAGAGCTGACCTGGAGTGGCTCCGA 

GGCATTGGCTGGATCCCACTGGATTCTGTGGACCATGTAAGGGTTACTAAGAACCAGGAAATG 

ATGAGTCAGATCAAATATAAGAAAAATGCCCTTGAAAACTATCCTAACTTTACAAGTGTGGTG 

GATCCTCC AGAGATTGTTTTAGCCAAG ATT AATT CTGTCAATCAAAGTGATGT AAAAT AT AAA 

GAAACATTTAATAAAGCAAAGGGCAAATATACGTTTTCACCAGATACACCACATATCTCCCAC 

TCCAAAGACATGGGAAAACTCTACAGTACTATACTGTATAAAGGGGCGTGGGAGGGCACCAAG 

GCCTATGG CT AC ACC CTGG ATG AGCGCT ACATT C C C ATTGTTGG AG CCAAGC ATG CTG ATC TG 

GTGAACAGTGAGCTTAAATACAAAGAGACATATGAGAAGCAGAAAGGTCACTACCTGGCTGGA 

AAAGTGATCGGTGAATTCCCTGGTGTGGTTCACTGTCTGGATTTCCAAAAGATGAGGAGTGCG 

TTGAACTACAGAAAACATTATGAGGATACCAAAGCAAATGTTCATATCCCCAATGACATGATG 

AATCACGTGCTGGCTAAAAGGTGCCAGTACATCCTCAGTGACCTGGAGTATCGACACTATTTC 

CACCAGTGGACGTCTCTTCTGGAAGAACCCAATGTTATACGCGTCCGAAACGCCCAGGAGATC 

TTGAGTG AT AATGTGTAT AAAG ATG ACCTG AATTGGTTGAAAGGCATTGGTTGCTACGTTTGG 

GATACACCCCAAATCCTCCATGCCAAGAAATCATACGACCTTCAGAGTCAGCTACAATATACA 

GCAGCAGGTAAAGAAAATCTACAAAACTATAATCTGGTCACAGACACGCCCCTCTATGTGACT 

G CTGTT C AG AGTGGC ATT AATG CCAGTGAGGT AAAAT AT AAAG AAAATT ATC ATC AG ATT AAG 

GACAAATACACAACAGTTCTAGAAACAGTGGATTATGACAGAACCAGAAACCTGAAGAATCTT 

TACAGCAGTAACCTGTACAAGGAGGCCTGGGATAGAGTGAAAGCCACCAGCTACATCCTGCCT 

TCCAG C A CCTTGT CCCTG AC AC ACG CC AAG AACC AG AAG CATCTGGC C AGC C AT AT C AAAT AT 

CGGG AAG AAT ATG AAAAGTT CAAAG CTCT TT AT ACGT T ACC AAG AAG TGTTG A C GATG ATC CG 

AACACAGCACGGTGCCTCCGAGTTGGCAAGCTTAACATCGATCGCCTGTACAGATCAGTTTAT 

GAAAAGAAC AAG ATG AAAAT CC AC AT CGTG C CCG AC ATGGT AG AG ATGGTT AC TGC C AAGG AT 

TCCCAGAAGAAAGTCAGTGAGATTGATTACCGCCTGCGCCTCCACGAATGGATTTGCCACCCC 

GACTTGCAAGTCAATGATCACGTCAGGAAAGTCACAGATCAGATCAGCGATATTGTATACAAG 

GATGACCTCAACTGGCTG AAAGG C ATTGG T TGCT ACGT C TGGG AC ACTC CTG AAATCCTCC AT 

GCCAAGCATGCTTATGATCTACGTGATGATATCAAGTATAAAGCTCACATGTTGAAAACAAGG 

AATGACTACAAGCTTGTCACAGATACACCAGTCTACGTGCAGGCTGTCAAAAGTGGGAAACAG 

CTAAGTGACGCTGTCTACCACTATGACTATGTGCACAGTGTCAGAGGCAAAGTGGCTCCAACT 

ACCAAAACCGTGGATCTGGACCGGGCCCTTCATGCATACAAGCTCCAGAGTTCGAATCTATAC 

AAAACCAGCCTGCGCACCCTGCCCACTGGATATAGACTTCCAGGTGACACTCCTCACTTCAAA 

CACATCAAGGACACCCGTTACATGAGCAGTTATTTCAAGTACAAAGAAGCCTATGAACACACC 

AAGGCATATGGGTATACACTTGGCCCCAAAGATGTTCCATTTGTCCACGTCCGGAGAGTCAAC 

AATGTTACC AGCG AG AG ACTGT AT CGGGAAT TGT AC C AC AAACTG AAAG AC AAG A TCC AT A C A 

ACTCCCGATCCCCCTGAGATCCGCCAAGTCAAGAAGACACAAGAGGCTGTCAGTGAGTTGATC 

TACAAATCAGACTTCTTCAAGATGCAGGGCCACATGATCTCTCTGCCATACACACCCCAAGTG 

ATCCATTGCCGCTATGTGGGAGACATCACCAGTGATATTAAATACAAAGAGGACTTGCAGGTC 

CTGAAGGGATTTGGCTGCTTCCTGTATGACACTCCTGACATGGTCCGCTCCCGGCACCTGCGG 

AAGCTCTGGTCTAATTACCTATACACTGATAAGGCAAGGGAGATGCGAGACAAATACAAAGTG 

GTGCTTGACACTCCAGAATACAGAAAAGTGCAAGAACTGAAGACACATCTGAGTGAGCTGGTC 

TACAGAGCTGCAGGCAAGAAGCAGAAGTCAATCTTTACTTCAGTTCCTGATACTCCTGATCTT 

TTAAGAGCCAAGCGAGGGCAGAAGCTTCAGAGTCAGTATCTGTATGTTGAACTTGCCACCAAA 

GAGAGACCCCATCATCACGCTGGAAACCAGACCACAGCCTTGAAGCATGCTAAAGACGTGAAG 

GACATGGTCAGTGAGAAAAAGTACAAGATTCAATATGAAAAGATGAAAGACAAGTACACTCCG 

GTTCCAGATACGCCAATCCTCATCAGAGCCAAGAGGGCTTACTGGAATGCCAGTGATCTACGC 

TACAAAGAAACATTTCAAAAGACCAAAGGGAAATACCACACGGTGAAAGATGCCCTAGACATT 

GTCTATCATCGCAAAGTCACAGATGACATCAGTAAAATAAAATACAAGGAGAACTACATGAGC 
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J CAGTTGGGTATCTGGAGGTCCATTCCTGATCGTCCAGAGCATTTCCACCACCGAGCAGTCACT 
j G AC AC AG TC AGTG ATGT AAAAT AT AAAG AAG ACT TG ACTTGG CTT AAAGG CAT TGG TTGCT AT 

j GCCTATGATACCCCTGATTTCACTCTGGCTGAAAAGAACAAGACTCTCTACAGCAAGTATAAG 
i TAT AAAG AAGTATTTG AAAGG AC AAAGTC AG ATTTC AAG T ATGT TGCCGACTCTCCGATCAAT 

AGG C ATTTC AAGT ATG C AACT C AAT TG AT G AATG AG AAAAAAT AC AG AGCTG ATT ATG AG C AG 
CGGAAAGATAAATACCACCTGGTAGTCGATGAGCCTAGACATCTGCTGGCTAAGACCCGCAGC 
GACCAGATCAGTCAGATCAAATACAGGAAAAACTATGAAAAATCAAAGGACAAATTTACCTCA 
ATTGTGGATACTCCAGAACACCTGCGTACTACAAAAGTCAACAAACAAATCAGCGATATCCTT 
TATAAATTGGAATACAACAAGGCCAAACCCAGAGGCTACACCACAATCCACGACACGCCCATG 
TTGCTGCATGTCCGCAAGGTTAAAGATGAAGTCAGTGATCTGAAATACAAAGAAGTATACCAA 
AGAAATAAATCCAACTGCACCATTGAGCCAGATGCTGTTCATATCAAAGCAGCCAAGGACGCC 
TACAAAGTCAACACCAATCTGGACTATAAGAAACAGTACGAAGCCAACAAAGCCCACTGGAAG 
TGGACTCCTGACCGACCGGACTTCCTCCAGGCTGCCAAGTCATCCCTGCAGCAAAGCGATTTT 
GAATATAAGCTGGACCGGGAGTTCCTCAAGGGTTGCAAGCTTTCTGTCACTGATGACAAAAAC 
ACGGTGCTCGCCCTCAGGAATACTTTAATAGAAAGTGATCTGAAATACAAAGAGAAACATGTC 
AAGGAAAGAGGAACCTGTCATGCCGTACCTGACACGCCTCAGATCCTGCTGGCGAAGACTGTC 
AGCAACCTGGTGTCTGAGAACAAGTACAAGGACCATGTCAAGAAGCACTTGGCACAGGGCTCA 
TACACAACACTACCAGAGACCCGGGACACTGTTCACGTCAAGGAAGTGACCAAGCATGTCAGT 
GATACAAATTACAAAAAGAAGTTTGTCAAGGAGAAAGGAAAATCCAACTACTCCATCATGCTG 
GAGCCACCAGAGGTGAAACATGCTATGGAAGTGGCCAAGAAGCAAAGTGATGTCGCTTACAGA 
AAAG ATGCCAAAG AG AAGCTG C ATTACACCAC AGTGG CTG ATCGACC AG AC ATCAAG AAGG CC 
ACACAGGCAGCCAAACAGGCCAGTGAGGTGGAGTACAGAGCCAAGCACCGCAAGGAAGGCAGC 
CATGGCTTAAGCATGCTCGGTCG CCCAGACATAGAAATGGCCAAG AAGGCAGCCAAGCTG AGC 
1 AGCCAGGTTAAATACCGAGAAAATTTCGATAAAGAAAAGGGCAAGACACCAAAATACAATCCA 
AAAGACAGCCAGCTCTACAAAGTCATGAAAGATGCTAATAATCTTGCAAGTGAGGTTAAATAC 
AAGGCTGACCTGAAGAAACTTCACAAACCCGTGACTGACATGAAGGAGTCTCTGATCATGAAT 
CATGTCCTG AATACAAGCCAACTTGCCAGTTCTTACCAGTACAAG AAGAAGTATG AG A \GAGT 
AAAGG CC ACT ACC AC ACC AT AC C CG AT AAT CTG G AG C AG CTTC AC CT AAAAG AG G CC AC AG AA 
TTACAGAGTATAGTGAAATACAAAGAAAAGTATGAAAAGGAACGAGGAAAACCCATGCTGGAC 
TTTGAAACACCAACGTACATCACTGCCAAAGAGTCTCAGCAGATGCAGAGTGGGAAAGAATAT 
AGGAAAGATTATGAAGAGTCCATTAAAGGCAGAAACCTGACTGGCCTGGAGGTCACGCCAGCT 
TTGTT AC ATGTCAAATATGC AACT AAAAT AG CAAG CG AG AAAG AG T AC AGG AAAG AT CT AG AG 
GAAAGCATCCGTGGGAAGGGCCTCACTGAAATGGAAGATACACCTGACATGCTAAGAGCAAAG 
AATGC C ACTCAAATC CTC AATG AG AAAG AAT AT AAG CG AG AC CTG GAACTGG AAGT C AAAGG A 
AGAGGCCTGAATGCCATGGCCAATGAAACTCCGGATTTTATGAGGGCCAGGAATGCTACTGAT 
ATTGCCAGTCAGATTAAGTATAAGCAATCAGCAGAAATGGAGAAAGCCAATTTCACTTCTGTG 
GTTGATACTCCAGAGATCATTCATGCCCAACAAGTCAAGAATCTTTCAAGCCAGAAAAAGTAC 
AAGGAAGATGCTGAGAAGTCCATGTCGTATTATGAGACTGTTTTGGACACCCCAGAGATACAG 
AGAGTCCGGGAGAACCAAAAGAACTTCAGCCTTCTCCAATACCAGTGTGACCTTAAAAACAGT 
1 AAAGG AAAAATTACAGTTGTTCAAGACACGCCAGAAATACTGCGTGTAAAAGAAAATCAGAAG 

AATTTCAGCTCGGTTTTATATAAAGAGGATGTCTCACCAGGAACGGCTATCGGAAAGACACCT 
GAGATGATGAGAGTGAAACAAACACAGGACCACATTAGCTCGGTGAAGTATAAGGAAGCAATA 
GGACAAGGAACTCCAATCCCTGACCTGCCTGAAGTGAAACGTGTGAAGGAGACGCAGAAGCAC 
ATTAGCT CGGTTATGT AC AAAG AAAACTTGGG AAC AGG AATTCC AATCCCC AT C ACTC C TG AG 
ATTG AG AG AGT CAAAC AC AATC AAG AAAACCTT AGCT CGG TG TT ATA C AAAG AAAA C ATG GG C 
AAGGG AACC CCTTT ACCTGTC AC TCC CG AG ATGG AAAG AGTC AAAC AC AATC AAG AAAAT ATT 
AG CTCGGTGTTAT ACAAAG AAAAC ATGGG C AAGG G AACCCCT CT A CCTG TC ACT CCTG AG ATG 

gagagagtcaaacacaatcaagaaaatattagctcggtgttatacaaagaaaacatgggcaag 
ggaactcctttacctgtcactcccgagatggagcgagtcaaacacaatcaagaaaatattagc 
tcggttttgtacaaagaaaatgtggggaaagccaccgcaaccactgtgactcctgagatgcag 
agagtcaaacgcaatcaagaaaactttagctcggtgttatacaaagagaacctggggaaagca 
acccccacaccctttactcccgagatggagcgagtcaaacgcaatcaagaaaatattagctcg 
gttttgtacaaagagaacatgagaaaagcaactccgacacctgttactccagagatggagaga 
Igctaagcgcaaccaagaaaacattagctcggttctttattctgatagtttccggaaacaaata 
caaggcaaagctgcctatgtattggatacccccgagatgagacgggtgagggagacccaacgg 
cacatctcaacggtgaaatatcatgaagactttgagaaacacaagggttgcttcacaccagtg 
gtgacagatcctatcactgaacgagtaaagaagaacatgcaggacttcagtgacattaactac 
cgaggtattcagaggaaagtggtagaaatggaacaaaaacggaatgaccaagatcaggagact 
attacaggtttacgtgtctggcgtactaatcctggttcggtttttgactatgatccagcagaa 
gacaacatccagtcccgaagcttacacatgattaatgtccaagctcagcgccggagccgggag 
cagtcacgatctgccagtgcactaagcgtcagtgggggtgaggagaagtctgagcattcagaa 
gcaccagaccaccacctttcgacttacagcgacgggggtgtctttgcagtctcaacagcttac 
aaacatgcaaaaaccacagagctcccacaacaacgatcatcttcagttgctacccaacagaca 
acggtatcttccatcccatctcatccatctactgctggaaaaatcttccgtgccatgtatgac 
tatatggctgctgatgcagatgaggtgtccttcaaggatggagatgccatcataaatgttcaa 
gcaattgatgaaggctggatgtatggcactgtgcagaggactggcaggaccggaatgctccca 
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GCCAACTACGTTGAAGCTATTTAGGCATTTCAAAGCATCACACTTGTCTGCAGGACTTACAGA 


TCCTGCAGTCAATGTTTCGGTTTAGACTCTCCACTGTTACCTAAGTTCTCAAGCTGCCTATGG 


TTTTTCTGTGTCAATGTGATTTATGGTAGTACCATCCTTTCTCCTTTGGGTTTTAAAATAAGT 


TGCAGAACAGACACTTTAAAAGCTTCTGCAATATTATTTCTGTGCCTAGAGTCTTTCTCCATT 


i 


ATAAACATGTTTTAACATTATTTCTTTTCTAAAACAGGGATTTTGAATATGCCAAACACATTA 




AAGGAAAAATAGCAGAGATGTTCACCTTTTCCTTGCTGATTGCTAATGCTTATTATTTCTAAT 


TC AGTTCTG AAGTT AT AAACT T AT AAT C AAT AC AAA C C AG C AACT AAT AAAAC CT CT AATT CT 


GCAAAAAAAAAAAAAAAAAAAAAAGTCG 




ORF Start: ATG at 19 j ORF Stop: TAG at 201 19 




SEQIDNO:140 j6700aa MWatkD 


NOV35a, 

CGI 19566-01 Protein 
Sequence 

i 

i 

i 

i 


MADDEDYEEWEYYTEEWYEEVPGETITKIYETTTTRTSDYEQSETSKPALAQPALAQPASA 

KPVERRKVIRKKVDPSKFMTPYIAHSQKMQDLFSPNKYKEKFEKTKGQPYASTTDTPELRRIK 

KVQDQLSEVKYRMDGDVAKTICHVDEKAKDIEHAKKVSQQVSKVLYKQNWEDTKDKYLLPPDA 

PELVQAVKNTAMFSKKLYTEDWEADKSLFYPYNDSPELRRVAQAQKALSDVAYKKGLAEQQAQ 

FTPLADPPDIEFAKKVTNQVSKQKYKEDYENKIKGKWSETPCFEVANARMNADNISTRKYQED 

FENMKDQIYFMQTETPEYKMNKKAGVAASKVKYKEDYEKNKGKADYNVLPASENPQLRQLKAA 

GDALSDKLYKENYEKTKAKS INYCETPKFKLDTVLQNFSSDKKYKDS YLKDI LGHYVGSFEDP 

YHSHCMKVTAQNSDKNYKAEYEEDRGKGFFPQTITQEYEAIKKLDQCKDHTYKVHPDKTKFTQ 

VTDSPVLLQAQVNSKQLSDLNYKAKHESEKFKCHIPPDTPAFIQHKVNAYNLSDNLYKQDWEK 

SKAKKFDIKVDAIPLLAAKANTKNTSDVMYKKDYEKNKGKMIGVLSINDDPKMLHSLKVAKNQ 

SDRLYKENYEKTKAKSMNYCETPKYQLDTQLKNFSEARYKDLYVKDVLGHYVGSMEDPYHTHC 

MKVAAQNSDKSYKAEYEEDKGKCYFPQTITQEYDAIKKLDQCKDHTYKVHPDKTKFTAVTDSP 

VLLQAQLNTKQLSDLNYKAKHEGERFKCHIPADAPQFIQHRVNAYNLSDNVYKQDWEKSKAKK 

FDIKVDAIPLLAAKANTKNTSDVMYKKDYEKSKGKMIGALSINDDPKMLHSLKTAKNQSDREY 

RKDYEKSKTIYTAPLDMLQVTQAKKSQAIASDVDYKHILHSYSYPPDSINVDLAKKAYALQSD 

VEYKADYNSWMKGCGWVPFGSLEMEKAKRASDILNEKKYRQHPDTLKFTSIEDAPITVQSKIN 

QAQRSDIAYKAKGEEIIHNYNLPPDLPQFIQAKVNAYNISENMYKADLKDLSKKGYDLRTDAI 

PIRAAKAARQAASDVQYKKDYEKAKGKMVGFQSLQDDPKLVHYMNVAKIQSDREYKKDYEKTK 

SKYOTPHDMFNWAAKKAQDWSNVNYKHSLHHYTYLPDAMDLELSKNMMQIQSDNVYKEDYN 

NWMKGIGWIPIGSLDVEKVKKAGDALNEKKYRQHPDTLKFTSIVDSPVMVQAKQNTKQVSDIL 

YKAKGEDVKHKYTMSPDLPQFLQAKCNAYSISDVCYKRDWHDLIRKGNNVLGDAIPITAAKAS 

RNIASDYKYKEAYEKSKGKHVGFRSLQDDPKLVHYMNVAKLQSDREYKKNYENTKTSYHTPGD 

MVTITAAKMAQDVATNVNYKQPLHHYTYLPDAMSLEHTRNVNQIQSDNVYKDEYNSFLKGIGW 

IPIGSLEVEKVKKAGDALNERKYRQHPDTVKFTSVPDSMGMMLAQHNTKQLSDLNYKVEGEKL 

KHKYTIDPELPQFIQAKVNALNMSDAHYKADWKKTIRKGYDLRPDAIPIVAAKSSRNIASDCK 

YKEAYEKAKGKQVGFLSLQDDPKLVHYMNVAKIQSDREYKKGYEASKTKYHTPLDMVSVTAAK 

KSQEVATNANYRQSYHHYTLLPDALNVEHSRNAMQIQSDNLYKSDFTNWMKGIGWVPIESLEV 

EKAKKAGEILSEKKYRQHPEKLKFTYAMDTMEQALNKSNKLNMDKRLYTEKWNKDKTTIHVMP 

DTPDILLSRVNQITMSDKLYKAGWEEEKKKGYDLRPDAIAIKAARASRDIASDYKYKKAYEQA 

KGKHIGFRSLEDDPKLVHFMQVAKMQSDREYKKGYEKSKTSFHTPVDMLSWAAKKSQEVATN 

ANYRNVIHTYNMLPDAMSFELAKNMMQIQSDNQYKADYADFMKGIGWLPLGSLEAEKNKKAME 

IISEKKYRQHPDTLKYSTLMDSMNMVLAQNNAKIMNEHLYKQAWEADKTKVHIMPDIPQIILA 

KANAINISDKLYKLSLEESKKKGYDLRPDAIPIKAAKASRDIASDYKYKYNYEKGKGKMVGFR 

SLEDDPKLVHSMQVAKMQSDREYKKNYENTKTSYHTPADMLSVTAAKDAQANITNTNYKHLIH 

KYILLPDAMNIELTRNMNRIQSDNEYKQDYNEWYKGLGWSPAGSLEVEKAKKATEYASDQKYR 

QHPSNFQFKKLTDSMDMVLAKQNAHTMNKHLYTIDWNKDKTKIHVMPDTPDILQAKQNQTLYS 

QKLYKLGWEEALKKGYDLPVDAISVQLAKASRDIASDYKYKQGYRKQLGHHVGFRSLQDDPKL 

VLSMNVAKMQSEREYKKDFEKWKTKFSSPVDMLGWLAKKCQELVSDVDYKNYLHQWTCLPDQ 

NDWQAKKVYELQSENLYKSDLEWLRGIGWSPLGSLEAEKNKRASEIISEKKYRQPPDRNKFT 

SIPDAMDIVLAKTNAKNRSDRLYREAWDKDKTQIHIMPDTPDIVLAKANLINTSDKLYRMGYE 

ELKRKGYDLPVDAIPIKAAKASREIASEYKYKEGFRKQLGHHIGARNIEDDPKMMWSMHVAKI 

QSDREYKKDFEKWKTKFSSPVDMLGVVLAYKCQTLVSDVDYKNYLHQWTCLPDQSDVIHARQA 

YDLQSDNLYKSDLQWLKGIGWMTSGSLEDEKN^ 

LAKNNAITMNHRLYTEAWDKDKTTVHIMPDTPEVLLAKQNKVNYSEKLYKLGLEEAKRKGYDM 
RVDAIPIKAAKASRDIASEFKYKEGYRKQLGHHIGARAIRDDPKMMWSMHVAKIQSDREYKKD 
FEK WKTKFS S PVDMLGWLAKKCQTLVSDVD YKN YLHQWTCL PDQS DVI HARQ A Y DLQSDNMY 
KSDLQWMRGIGWVSIGSLDVEKCKRATEILSDKIYRQPPDRFKFTSVTDSLEQVLAKNNALNM 
NKRLYTEAWDKDKTQIHIMPDTPEIMLARQNKINYSETLYKLANEEAKKKGYDLRSDAIPIVA 
AKASRDVISDYKYKDGYRKQLGHHIGARNIEDDPKMMWSMHVAKIQSDREYKKDFEKWKTKFS 
SPVDMLGWLAKKCQTLVSDVDYKNYLHEWTCLPDQNDVIHARQAYDLQSDNIYKSDLQWLRG 
IGWVPIGSMDVVKCKRAAEILSDNIYRQPPDKLKFTSVTDSLEQVLAKNNALNMNKRLYTEAW 
DKDKTQVHIMPDTPEIMLARQNKINYSESLYRQAMEEAKKEGYDLRSDAIPIVAAKASRDIAS 
DYKYKEAYRKQLGHHIGARAVHDDPKIMWSLHIAKVQSDREYKKDFEKYKTRYSSPVDMLGIV 
LAKKCQTLVSDVDYKHPLHECICLPDQNDIIHARKAYDLQSDNLYKSDLEWMKGIGWVPIDSL 
EVVRAKRAGELLSDTIYRQRPETLKFTSITDTPEQVLAKNNALNMNKRLYTEAWDNDKKTIHV 
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mpdtpeimlaklnrinysdklyklaleeskkegydlrldaipiqaakasrdiasdykykegyr] 

kqlghhigarnikddpkmmwsihvakiqsdreykkefekwktkfsspvdmlgvvlakkcqilv 

sdidykhplhewtclpdqndviqarkaydlqsdaiyksdlewlrgigwvpigsvevekvkrag 

eilsdrkyrqpadqlkftcitdtpeivlaknnaltmskhlyteawdadktsihvmpdtpdill 

aksnsanisqklytkgwdeskmkdydlradaisiksakasrdiasdykykeayekqkghhiga 

QS I EDDPKIMCAIHAEKIQSEREYKKEFQKWKTKFSSPVDMLS I LLAKKCQTLVTDIYYRNYL 

HEWTCMPDQNDIIQAKKAYDLQSDALYKADLEWLRGIGWMPQGSPEVLRVKNAQNIFCDSVYR 

TPWNLKYTSI VDTPEWLAKSNAENI S I PK YRE VWDKDKTS I H IMPDTPEI NLARANALNVS 

NKLYREGWDEMKAGCDVRLDAIPIQAAKASREIASDYKYKLDHEKQKGHYVGTLTARDDNKIR 

WALIADKLQNEREYRLDWAKWKAKIQSPVDMLSILHSKNSQALVSDMDYRNYLHQWTCMPDQN 

DVIQAKKAYELQSDNVYKADLEWLRGIGWMPNDSVSVNHAKHAADIFSEKKYRTKIETLNFTP 

VDDRVDYVTAKQSGEILDDIKYRKDWNATKSKYTLTETPLLHTAQEAARILDQYLYKEGWERQ 

KATGYILPPDAVPFVHAHHCNDVQSELKYKAEHVKQKGHYVGVPTMRDDPKLVWFEHAGQIQN 

ERLYKEDYHKTKAKINIPADMVSVLAAKQGQTLVSDIDYRNYLHQWMCHPDQNDVIQARKAYD 

LQSDNVYRADLEWLRGIGWIPLDSVDHVRVTKNQEMMSQIKYKKNALENYPNFTSVVDPPEIV 

LAKINSVNQSDVKYKETFNKAKGKYTFSPDTPHISHSKDMGKLYSTILYKGAWEGTKAYGYTL 

DERYIPIVGAKHADLVNSELKYKETYEKQKGHYLAGKVIGEFPGWHCLDFQKMRSALNYRKH 

YEDTKANVHIPNDMMNHVIAKRCQYILSDLEYRHYFHQWTSLLEEPNVIRVRNAQEILSDhTVY 

KDDLNWLKGIGCYVWDTPQILHAKKSYDLQSQLQYTAAGKENLQNYNLVTDTPLYVTAVQSGI 

NASEVKYKENYHQIKDKYTTVLETVDYDRTRNLKNLYSSNLYKEAWDRVKATSYILPSSTLSL 

THAKNQKHLASHIKYREEYEKFKALYTLPRSVDDDPNTARCLRVGKLNIDRLYRSVYEKNKMK 

IHIVPDMVEMVTAKDSQKKVSEIDYRLRLHEWICHPDLQVNDHVRKVTDQISDIVYKDDLNWL 

KGIGCYVWDTPEILHAKHAYDLRDDIKYKAHMLKTRNDYKLVTDTPVYVQAVKSGKQLSDAVY 

HYDYVHSVRGKVAPTTKTVDLDRALHAYKLQSSNLYKTSLRTLPTGYRLPGDTPHFKHIKDTR 

YMSSYFKYKEAYEHTKAYGYTLGPKDVPFVHVRRVNNVTSERLYRELYHKLKDKIHTTPDPPE 

IRQVKKTQEAVSELIYKSDFFKMQGHMISLPYTPQVIHCRYVGDITSDIKYKEDLQVLKGFGC 

FLYDTPDMVRSRHLRKLWSNYLYTDKAREMRDKYKWLDTPEYRKVQELKTHLSELVYRAAGK 

KQKSIFTSVPDTPDLLRAKRGQKLQSQYLYVELATKERPHHHAGNQTTALKHAKDVKDMVSEK 

KYKIQYEKMKDKYTPVPDTPILIRAKRAYWNASDLRYKETFQKTKGKYHTVKDALDIVYHRKV 

TDDISKIKYKENYMSQLGIWRSIPDRPEHFHHRAVTDTVSDVKYKEDLTWLKGIGCYAYDTPD 

FTLAEKNKTLYSKYKYKEVFERTKSDFKYVADSPINRHFKYATQLMNEKKYRADYEQRKDKYH 

LWDEPRHLLAKTRSDQISQIKYRKNYEKSKDKFTSIVDTPEHLRTTKVNKQISDILYKLEYN 

KAKPRGYTTIHDTPMLLHVRKVKDEVSDLKYKEVYQRNKSNCTIEPDAVHIKAAKDAYKVNTN 

LDYKKQYEANKAHWKWTPDRPDFLQAAKSSLQQSDFEYKLDREFLKGCKLSVTDDKNTVLALR 

NTLIESDLKYKEKHVKERGTCHAVPDTPQILLAKTVSNLVSENKYKDHVKKHLAQGSYTTLPE 

TRDTVHVKEVTKHVSDTNYKKKFVKEKGKSNYSIMLEPPEVKHAMEVAKKQSDVAYRKDAKEK 

LHYTTVADRPDIKKATQAAKQASEVEYRAKHRKEGSHGLSMLGRPDIEMAKKAAKLSSQVKYR 

ENFDKEKGKTPKYNPKDSQLYKVMKDANNLASEVKYKADLKKLHKPVTDMKESLir^NHVLNTS 

QLAS S YQ YKK KYE KS KGH YHT I PDNL EQLHL KEATE LQS IVKYKEKYEKERGKPMLDFETPTY 

ITAKESQQMQSGKEYRKDYEESIKGRNLTGLEVTPALLHVKYATKIASEKEYRKDLEESIRGK 

GLTEMEDTPDMLRAKNATQILNEKEYKRDLELEVKGRGLNAMANETPDFMRARNATDIASQIK 

YKQSAEMEKANFTSWDTPEIIHAQQVKNLSSQKKYKEDAEKSMSYYETVLDTPEIQRVRENQ 

KNFSLLQYQCDLKNSKGKITVVQDTPEILRVKENQKNFSSVLYKEDVSPGTAIGKTPEMMRVK 

QTQDHISSVKYKEAIGQGTPIPDLPEVKRVKETQKHISSVMYKENLGTGIPIPITPEIERVKH 

NQENLSSVLYKENMGKGTPLPVTPEMERVKHNQENISSVLYKENMGKGTPLPVTPEMERVKHN 

QEN I S S VL Y KENMG KGTP LP VT PEME R VKHNQEN I S S VL Y KE NVG KAT ATTVT P E MQ R V K RN Q 

ENFSSVLYKENLGKATPTPFTPEMERVKRNQENISSVLYKENMRKATPTPVTPEMERAKRNQE 

NISSVLYSDSFRKQIQGKAAYVLDTPEMRRVRETQRHISTVKYHEDFEKHKGCFTPWTDPIT 

ERVKKNMQDFSDINYRGIQRKWEMEQKRNDQDQETITGLRVWRTNPGSVFDYDPAEDNIQSR 

SLHMINVQAQRRSREQSRSASALSVSGGEEKSEHSEAPDHHLSTYSDGGVFAVSTAYKHAKTT 

ELPQQRSSSVATQQTTVSSIPSHPSTAGKIFRAMYDYMAADADEVSFKDGDAIINVQAIDEGW 

MYGTVQRTGRTGMLPANYVEAI 



SEQ ID NO: 141 



20881 bp 



|NOV35b, 

CGI 1 9566-02 DNA 

Sequence 



CCCCACCTTTTGAGCAAGTTCAGCCTGGTTAAGTCCAAGCTGGTGATAAAACTACAAAGCAGA 



ATACGAAGAAGACAGAGGCAAAGGCTTCTTCCCTCAGACCATAACTCAAGAATATGGGGGTCT 



CGCAGTAATTTATGCTCTTTGCTTTTGTCTTTTCATAGTTTTCCTTGTATAGTTTGTCACTTA 



GGGCATCTCCTGCTGCCTTCAGCTGCCTAAGCTGTGGGTTCTCTGAAGCAGGAAGCACATTAT 



AATCTGCTTTTCCTTTATTCTTTTCATAGTCTTCTTTGTATTTTGCTGCTGAGGAAATTTATT 



TGGTAGATTGAAGGTTTGAACGAGAGCTACAGAAACGAAAGAAAAAGTCTGTATAAGCCAATG 



GTGTTCGGGAAGAAAATAACCCCATTGCCTTGAGTTTGTAGGTGCCACTACTACTCTGAAAAA 



TGGCAGATGACGAAGACTATGAGGAGGTGGTGGAGTACTACACAGAAGAAGTGGTTTACGAAG 
AGGTGCCGGGAGAGACAATAACAAAAATTTATGAGACTACGACAACAAGGACATCTGACTATG 
AGCAATCAGAAACTTCCAAACCAGCTCTGGCACAGCCAGCACTGGCACAGCCAGCATCAGCAA 
AGCCGGTGGAGAGGAGGAAGGTCATCCGGAAGAAAGTGGATCCTTCAAAGTTCATGACCCCCT 
ACATTGCACACAGTCAGAAAATGCAGGATCTTTTTAGCCCAAATAAATACAAGGAGAAGTTTG 
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agaaaacaaaaggacagccatacgccagcacaacagatactccagaacttcgcagaatcaaaa] 
aagtacaagatcaactcagtgaggttaagtatcgaatggatggtgatgttgctaagactatat j 
gtcacgtagatgaaaaagcaaaggatattgaacatgcaaagaaagtgtcgcagcaagtcagta 
aggttttatacaagcagaactgggaagacaccaaggataagtacctgcttcctcctgatgccc 
ctgaacttgtccaggccgttaagaacaccgccatgttcagcaagaaactgtacactgaagact 

aggcccagaaagctctcagtgatgttgcctacaaaaaaggtctcgctgaacagcaagctcaat 

TCACGCCTCTGGCCGATCCTCCAGATATAGAATTTGCCAAGAAAGTAACCAATCAAGTGAGCA 
AGCAAAAATACAAAGAAGACTATGAAAATAAAATCAAAGGCAAATGGAGTGAGACACCTTGCT 
TTGAAGTTGCAAATGCCAGAATGAATGCTGATAACATTAGCACAAGGAAATACCAGGAAGATT 
TTGAAAACATGAAAGACCAGATCTACTTCATGCAGACCGAAACACCAGAGTATAAAATGAATA 
AAAAAGCTGGTGTGGCAGCTAGCAAGGTAAAATACAAAGAAGACTATGAAAAGAATAAAGGAA 
AAGCAGATTATAATGTGCTTCCTGCTTCAGAGAACCCACAGCTTAGGCAGCTGAAGGCAGCAG 
GAGATGCCCTAAGTGACAAACTATACAAGGAAAACTATGAAAAGACAAAAGCAAAGAGCATAA 
ATTACTGCGAGACCCCCAAATTCAAGCTCGATACTGTTCTGCAGAACTTCAGTAGTGATAAAA 
AATATAAAGATTCCTACTTAAAAGATATTTTGGGACATTATGTAGGCAGCTTCGAGGATCCAT 
ACCATTCACACTGCATGAAAGTCACAGCTCAAAACAGTGATAAAAACTACAAAGCAGAATACG 
AAGAAGACAGAGGCAAAGGCTTCTTCCCTCAGACCATAACTCAAGAATATGAAGCAATTAAGA 
AACTAGATCAGTGTAAAGACCACACCTACAAAGTCCATCCAGATAAGACAAAATTCACCCAAG 
TT AC AG ACTC TCCTGTTCTG C TAG AAGCC C AAGT C AATTC C AAACAACTG AGTG ACT T AAATT 
AC AAAGC AAAAC ATG AAAGTG AAAAGTTC AAGTG CC AT AT CCCCC CTG AT ACT CC TG GTTTT A 
TCCAGCACAAAGTCAATGCCTATAACTTGAGTGATAATCTTTATAAGCAAGACTGGGAGAAGA 
GCAAAGCCAAAAAGTTTGACATTAAAGTGGATGCCATTCCCCTGCTGGCAGCCAAAGCCAACA 
CCAAGAACACCAGCGATGTGATGTACAAGAAAGACTATGAAAAAAACAAAGGGAAAATGATTG 
GAGTCCTCAGCATTAATGACGATCCCAAGATGCTGCACTCCTTGAAGGTGGCCAAAAACCAGA 
GTG AT AG ATT AT ACAAGG AAAAC T ATGAG AAG AC AAAGGC AAAGAGT ATG AAT T A C TG TG AG A 
, CCCCAAAATATCAACTTGATACTCAGCTGAAGAACTTCAGTGAGGCTAGATATAAAGACTTAT 
ATGTAAAGGATGTTTTGGGACATTATGTAGGCAGCATGGAGGACCCATATCACACACACTGCA 
TGAAAGTTGCAGCTCAAAACAGTGATAAAAGTTACAAAGCAGAATATGAAGAAGATAAAGGAA 
AATGCTATTTCCCTCAGACAATAACACAAGAATATGACGCAATCAAGAAGCTGGACCAGTGTA 
AAG ATC ATAC CT AC AAAGTTC AT CC AG AT AAG ACCAAATTCACGG C AGTC ACT G AT TC TCCTG 
TACTGTTGCAAGCCCAGCTCAACACGAAACAGCTTAGTGATCTGAATTACAAAGCAAAACATG 
AAGGTGAGAGGTTCAAGTGCCATATACCAGCAGATGCTCCACAGTTTATCCAACACAGAGTCA 
ATGCCTATAATCTGAGTGATAATGTTTATAAGCAAGACTGGGAGAAGAGCAAAGCCAAGAAGT 
TTGACATTAAAGTGGACGCCATTCCCCTGTTGGCAGCCAAAGCCAACACCAAGAACACCAGCG 
ATGTGATGTACAAGAAAGACTATGAAAAGAGCAAAGGGAAAATGATTGGAGCCCTCAGCATTA 
ATGACGATCCAAAGATGCTGCACTCCTTGAAGACAGCCAAAAACCAGAGTGATCGCGAATATC 
GAAAAGATTATGAAAAGTCAAAAACTATCTACACGGCACCTCTTGATATGCTCCAAGTCACTC 
AAGCT AAG AAATCTC AGG C AATTG C C AGCG ACGTTG ATT ATAAGC AC ATCTTA C AC AG T T A C A 
GCTACCCCCCTGATAGCATCAATGTGGACCTTGCCAAGAAGGCATATGCGCTGCAGAGCGATG 
TTGAATACAAAGCTGACTACAATAGCTGGATGAAAGGTTGTGGCTGGGTGCCTTTTGGGTCCT 
TAGAAATGGAAAAGGCAAAGCGAGCTTCAGACATCCTCAATGAGAAAAAATATCGCCAACATC 
CAGACACCCTCAAGTTTACCTCGATTGAAGATGCTCCAATTACAGTACAGTCTAAAATTAACC 
AGGCCCAGAGGAGTGATATCGCTTACAAAGCCAAAGGAGAGGAAATTATTCACAATTACAACC 
TGCCACCAGACCTGCCCCAGTTCATCCAGGCTAAAGTTAATGCCTACAATATCAGTGAGAATA 
TGT AC AAAG C AG ACTTGAAAG ACTTG AGC AAG AAGGG AT ATG AC CTG AG AACTG ATG C G AT T C 
CC ATC AG AG CTGCC AAAG CTG CC AGG C AGGC GGC G AGTG ACGTT CAGT AC AAAAAAG A CT ATG 
AAAAGGCTAAAGGGAAAATGGTTGGCTTCCAAAGTCTTCAAGATGACCCTAAACTGGTTCATT 
ATATGAACGTGGCCAAGATACAATCAGATCGGGAGTATAAAAAAGACTATGAGAAGACAAAGT 
CCAAATACAACACGCCCCATGATATGTTCAATGTCGTGGCGGCTAAGAAAGCCCAGGATGTGG 
TCAGCAATGTCAACTATAAGCATTCTCTCCATCATTACACCTACTTGCCTGACGCCATGGACC 
TGG AG CTGTCTAAG AACATG A TG C AG AT AC AG AGTG AT AACGTCT AC AAG G AAG A CT A C AA C A 
ACTGGATGAAAGGCATTGGCTGGATTCCTATTGGCAGTCTCGACGTCGAAAAAGTTAAAAAGG 
CCGGTGATGCTCTGAATGAAAAGAAGTACAGGCAACATCCAGACACCCTCAAATTTACCAGCA 
TTGTGGACTCCCCAGTTATGGTCCAGGCAAAACAGAACACGAAGCAAGTCAGTGATATCTTAT 
ACAAGGCTAAAGGAGAAGATGTGAAACATAAATACACCATGAGTCCTGATCTTCCTCAGTTTC 
TCCAGGCCAAGTGCAATGCTTACAGTATAAGTGACGTCTGTTATAAACGGGATTGGCATGACT 
TAATACGCAAGGGCAACAATGTGCTGGGCGATGCTATTCCCATCACTGCAGCCAAGGCATCGA 
GAAACATTGCCAGTGATTATAAATACAAGGAAGCTTATGAGAAGTCAAAGGGAAAGCATGTGG 
GTTTCAGAAGCCTCCAGGATGATCCCAAGCTGGTCCACTATATGAATGTGGCAAAGCTGCAGT 
CTGATCGTGAATACAAGAAGAACTATGAGAACACCAAAACCAGCTACCATACCCCTGGGGACA 
TGGTTACGATCACAGCTGCAAAGATGGCCCAGGATGTCGCTACCAATGTCAACTACAAACAGC 
CATTGCATCATTACACATACCTACCTGACGCCATGAGTCTTGAGCATACGAGGAATGTCAATC 
AAATT CAG AGTGAT AATGTGT AT AAAG ACG AGT AT AAC AG CT TCTTG AAGGG C AT CGG ATG G A 
TCCCTATTGGTTCCCTGGAGGTGGAGAAGGTCAAGAAAGCAGGCGATGCATTAAATGAGAGGA 
AGT ATCG AC AG C ACC C AG AT ACCG TC AAG TT CAC AAG TG TG C CTG ATT CC ATGGG C ATG ATG T 
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TGGCTCAGCATAACACAAAGCAGCTAAGTGATTTGAACTACAAGGTAGAGGGAGAGAAACTGA 

AGCACAAGTATACTATTGACCCTGAATTGCCTCAGTTTATTCAAGCCAAAGTCAACGCCCTCA 

ACATGAGTGATGCTCATTATAAAGCAGATTGGAAGAAAACCATTCGCAAGGGCTATGATTTGA 

GACCAGATGCCATCCCAATTGTTGCTGCAAAAAGTTCAAGGAATATTGCTAGTGATTGCAAAT 

ATAAGGAGGCCTACGAGAAAGCCAAAGGCAAGCAAGTTGGATTTCTCAGTCTTCAGGATGATC 

CTAAACTGGTTCACTACATGAATGTGGCCAAAATCCAGTCTGATCGTGAGTACAAAAAGGGCT 

ATGAAGCCAGCAAGACCAAGTACCACACACCTCTGGATATGGTCAGTGTGACAGCTGCAAAGA 

AATCTCAGGAGGTTGCCACCAACGCCAACTACAGACAGTCATACCACCACTACACTCTCCTGC 

CCG ATG C CTTGAATGTGG AG C ACTCC AGG AATG C C ATGC AG ATT C AG AGTG AT AATCTG T AC A 

AATCTGACTTCACCAATTGGATGAAAGGGATCGGCTGGGTGCCCATAGAGTCCCTGGAGGTGG 

AGAAGGCAAAGAAAGCAGGAGAGATTCTTAGTGAGAAGAAGTATCGCCAGCACCCCGAGAAGC 

TGAAGTTCACTTACGCCATGGACACAATGGAACAGGCACTTAACAAGAGTAACAAACTGAACA 

TGG AC AAG AGGCT CT AC ACTG AAAAATGG AAC AAGG AC AAG ACC AC C ATTC ATGT C ATG C CTG 

ACACACCGGATATTTTACTCTCCAGAGTAAACCAAATCACCATGAGTGATAAACTGTACAAAG 

CTGGC TGGG AAG AGG AAAAG AAG AAAGGAT ATG A CCTG AGGC CTG ATGCC ATT G C AAT AAAGG 

CTG CAAG AG C CTCT AG AG AC ATTG CC AGTG ATT A C AAAT AC AAG AAAG CC T ATG AAC AAG CCA 

AAGGGAAACACATTGGCTTCCGGAGCCTGGAAGATGACCCCAAGCTGGTGCACTTCATGCAAG 

TGGCCAAGATGCAGTCAGACCGGGAATACAAGAAGGGATATGAGAAATCCAAGACCTCCTTCC 

ACACCCCGGTGGACATGCTCAGTGTGGTGGCAGCC AAG AAGTCTCAGGAAGTGGCC ACC AATG 

CCAACTAC AGG AACGTGATCC AT ACCT AC AAC ATGC TTCCTG ATGCC ATG AGCTTTGAATTGG 

CCAAAAATATG ATG C AGATTC AAAGTGAT AATC AGT ACAAGGCTG ACTATGCTGACTT C ATG A 

AGGGCATTGGATGGCTCCCTCTGGGCTCCCTGGAAGCAGAGAAAAACAAGAAAGCCATGGAGA 

TTATTAGTGAAAAGAAGTACCGCCAGCACCCAGACACTTTGAAGTATTCCACACTCATGGACT 

CGATGAACATGGTTTTGGCCCAGAATAATGCAAAAATTATGAACGAACATCTCTACAAACAAG 

CATGGGAGGCTGACAAAACCAAAGTCCACATCATGCCTGATATCCCCCAGATTATTTTGGCAA \ 

AGGCAAATGCAATTAATATAAGTGATAAACTCTACAAACTTTCCTTGGAAGAGTCTAAAAAGA 

AAGGC T ATG AT CTCAG AC CTG AT G C AATT C C T ATC AAAG CTG CC AAGGCT T C C AG AG A TAT TG 

CAAGTGATTATAAATACAAGTACAATTATGAAAAAGGGAAGGGGAAAATGGTTGGTTTCCGCA 

GTCTCGAGGATGATCCCAAATTAGTCCATTCCATGCAAGTGGCTAAGATGCAATCTGATCGGG 

AGTACAAGAAAAACTATGAGAACACAAAGACCAGCTACCACACCCCTGCCGACATGCTCAGTG 

TCACGGCTGCAAAGGATGCCCAAGCCAACATCACCAACACTAACTACAAGCACCTGATTCACA 

AGTACATCCTCCTTCCAGATGCAATGAACATTGAGCTGACCAGGAATATGAATCGCATACAGA 

GTGATAATGAATATAAGCAAGATTACAATGAATGGTACAAAGGGCTTGGCTGGAGTCCAGCAG 

GTTCTCTGGAAGTGGAGAAGGCCAAGAAAGCAACTGAATATGCCAGTGATCAGAAATACCGCC 

AGCACCCGAGCAACTTCCAGTTTAAGAAGCTGACTGATTCCATGGACATGGTGCTTGCCAAGC 

AGAATGCACATACCATGAACAAGCATTTATACACCATTGATTGGAATAAAGATAAGACCAAGA 

TTCATGTGATGCCTGATACACCAGATATTTTACAAGCCAAGCAGAATCAAACACTGTATAGTC 

AGAAACTCTATAAACTTGGATGGGAAGAAGCTTTGAAGAAAGGCTATGATCTCCCAGTTGATG 

CAATTTCTGTACAGCTAGCTAAAGCTTCAAGAGACATTGCTAGTGATTATAAATACAAACAAG 

GCT AC CG AAAG CAACTTGGCCACCATGTTGG ATT CCGGAGTCTG CAAG ATG AC CCAAAACTTG 

TGTTGTCCATGAATGTAGCCAAAATGCAGAGTGAAAGAGAATACAAGAAGGACTTTGAGAAGT 

GGAAAACTAAGTTCTCCAGCCCAGTGGACATGTTGGGAGTGGTACTGGCCAAGAAGTGTCAGG 

AGTTGGTTAGTGACGTGGACTACAAGAACTACCTGCATCAGTGGACATGTCTGCCTGATCAGA 

ACGATGTTGTG CAAG CT AAG AAAGT T T ATG AACTG C AAAGTG AG AATCT AT AT AAAT CTG ACC 

TTGAGTGGCTGAGAGGCATAGGATGGAGTCCCTTGGGTTCTTTAGAGGCAGAAAAGAACAAGC 

GGGCTTCGGAAATCATCAGTGAGAAGAAATATCGTCAGCCTCCAGACAGAAACAAGTTCACCA 

GCATTCCTGATGCCATGGATATAGTTCTGGCAAAGACAAATGCCAAAAATAGGAGTGATAGAC 

TTTATAGAGAAGCTTGGGACAAAGACAAGACTCAGATCCACATCATGCCTGATACACCTGACA 

TTGTTCTGGCTAAAGCAAACTTAATCAACACAAGTGATAAACTCTACCGAATGGGTTATGAGG 

AGCTG AAG AG AAAAGGTT ACG AT CTTCCTG TTG ATG C CAT AC C AATC AAAG C AGC AAAAGC CT 

CCCGGGAAATTGCCAGTGAATACAAGTACAAGGAAGGCTTTCGCAAGCAGCTCGGCCACCACA 

TTGGTGCCCGGAACATTGAAGATGACCCCAAGATGATGTGGTCCATGCATGTGGCCAAGATCC 

AGAGTGACAGGGAGTACAAGAAGGACTTTGAGAAGTGGAAGACCAAGTTCAGCAGCCCAGTGG 

ACATGCTGGGGGTGGTGTTGGCCTATAAGTGCCAGACCTTAGTCAGCGACGTGGACTACAAGA 

ACTACCTGCACCAGTGGACATGCCTGCCCGACCAGAGCGATGTCATCCATGCTCGGCAGGCCT 

ATGACCTCCAGAGCGATAATTTGTACAAGTCAGACCTTCAGTGGCTAAAAGGCATTGGCTGGA 

TGACTAGTGGTTCTCTCGAGGATGAGAAAAATAAACGAGCCACCCAGATTTTGAGTGACCATG 

TTTACCGTCAGCACCCAGATCAATTTAAGTTTTCCAGCCTTATGGATTCCATACCAATGGTTT 

TGGCAAAAAACAATGCTATTACCATGAATCATCGCCTCTATACAGAAGCTTGGGATAAAGATA 

AAACCACTGTCCACATTATGCCAGATACCCCTGAAGTTTTATTAGCTAAACAAAACAAAGTAA 

ATTACAGTGAGAAATTGTATAAGCTTGGCCTAGAAGAAGCCAAGAGGAAAGGTTATGACATGC 

GGGTAGATGCCATTCCTATCAAGGCAGCCAAGGCCTCCAGAGATATTGCAAGTGAATTCAAGT 

ACAAAGAAGGCTATCGTAAGCAGCTCGGCCACCACATTGGTGCCCGAGCTATACGTGATGACC 

CCAAGATGATGTGGTCCATGCACGTGGCCAAGATCCAGAGTGACAGGGAGTACAAGAAGGACT 

T TG AG AAGTGG AAG ACCAAGTT C AGC AGC CC AGTGG AC ATG CTGGGGGTGGTG CTGG C CAAG A 

AGTGCCAGACCTTAGTCAGCGATGTGGACTACAAGAACTACCTGCACCAGTGGACATGCCTGC 
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CCGACCAGAGCGACGTCATCCATGCTCGGCAGGCCTATGACCTCCAGAGCGATAATATGTACA | 

AGTCTGATCTCCAGTGGATGAGAGGCATTGGCTGGGTGTCCATTGGCTCTTTGGATGTGGAAA 

AATGCAAAAGGGCAACTGAAATTTTGAGTGATAAAATCTATCGCCAGCCTCCAGACAGATTCA 

AATTT ACC AG TGTGACTG ACT CTCTG G AAC AAGTGCTGGCC AAG AAC AATG CT CTC AAC ATG A 

ATAAGCGTTTATACACAGAGGCCTGGGACAAAGACAAGACTCAAATTCACATAATGCCTGATA 

C AC C AG AG ATT ATGTTGG C AAGGCAG AAC AAAATC AAC TACAGTG AG ACTCTA T AC AAACTTG 

CCAATGAAGAAGCAAAAAAGAAAGGCTACGACTTGCGAAGTGACGCCATCCCCATCGTGGCTG 

CCAAGGCCTCCAGGGACGTTATCAGTGATTACAAATACAAAGATGGTTACCGCAAGCAGCTCG 

GC C AC C AC ATTGG AG CCCGG AAC AT TG AAG ATG AC CCCAAG ATG ATGTGG T CC ATG C ATG TGG 

CCAAGATCCAGAGTGACAGGGAGTATAAG AAGGACTTTG AGAAGTGGAAG ACC AAGTT CAG C A 

GCCCAGTGGACATGCTGGGAGTGGTGTTAGCCAAGAAGTGCCAGACCTTAGTCAGCGATGTGG 

ACTACAAGAACTACCTGCACGAGTGGACGTGCCTGCCCGACCAGAATGATGTCATCCATGCTC 

GGCAGGCCTATGACCTCCAGAGCGATAACATTTACAAATCTGATCTCCAGTGGCTGAGAGGCA 

TTGGCTGGGTCCCCATTGGGTCTATGGATGTGGTCAAGTGCAAGAGAGCTGCTGAAATACTGA 

GTGATAACATCTACCGCCAGCCTCCGGACAAGCTGAAATTTACCAGTGTGACTGACTCTCTAG 

AGCAGGTGCTGGCCAAGAACAATGCTCTCAATATGAACAAGCGCTTATACACAGAAGCCTGGG 

ACAAAGACAAGACCCAAGTCCATATTATGCCTGATACACCTGAAATCATGTTGGCAAGACAAA 

AT AAAAT AAATTATAGTG AGAGC CTCTATCGTCAGGCCATGGAAG AAG C C AAG AAAGAAGG CT 

ATGACTTGAGAAGTGATGCCATTCCCATTGTGGCTGCCAAGGCCTCTCGGGATATTGCCAGTG 

ATT AC AAATAC AAAG AAG CAT AT CGTAAGCAGTTGGGTCACC AC ATTGG CG CC CG AGC AGT AC 

ACGATGACCCCAAGATAATGTGGTCCCTCCACATTGCCAAAGTGCAGAGTGACCGTGAGTACA 

AGAAAGATTTTGAGAAATACAAGACAAGGTACAGCAGCCCAGTGGACATGCTTGGTATCGTTT 

TGG CCAAGAAGTGTC AG ACCTTGGTCAGCG ATGTGG ACTATAAACATCCTCTGCATGAATGCA 

TCTGCCTGCCCGACCAGAATGACATCATTCATGCACGGAAAGCCTATGACCTCCAGAGTGACA 

ATTTGTATAAGTCAGACCTTGAATGGATGAAAGGCATTGGCTGGGTTCCGATTGATTCCTTGG 

AAGTTGTTAGGGCCAAGAGAGCTGGAGAATTACTTAGTGATACTATCTACCGTCAGCGTCCAG 

AAACG CTG AAATTT ACC AGTAT AACGG AC AC T CCGG AG C AGGTGCTGGCAAAAAA C AATG C TT 

T AAAC ATGAATAAGCG CT TAT AT ACTG AAG C CTGGG AC AATG AC AAG AAAACT AT T CATGT C A 

TGCCTGATACACCAGAAATCATGTTAGCCAAACTCAACCGAATAAACTACAGTGATAAACTCT 

ATAAACTTGCTTTGGAAGAGTCCAAGAAGGAAGGCTATGACTTGCGTCTGGATGCCATTCCAA 

TCCAAGCAGCCAAGGCTTCAAGAGATATTGCTAGTGATTACAAGTACAAGGAAGGCTACCGCA 

AACAGCTTGGCCACCATATTGGGGCCCGGAACATTAAGGATGACCCGAAGATGATGTGGTCCA 

TCCATGTGGCCAAGATCCAGAGTGACAGGGAGTACAAGAAGGAGTTTGAGAAGTGGAAGACCA 

AGTTCAGCAGCCCAGTGGACATGCTGGGGGTGGTGCTGGCCAAGAAGTGTCAGATCCTTGTAA 

GCGACATAGACTACAAGCATCCCCTGCATGAATGGACCTGCCTGCCTGATCAGAATGACGTCA 

TTCAGGCTCGGAAGGCCTATGACCTGCAGAGTGATGCTATTTACAAATCTGATCTTGAGTGGC 

TGAGAGGCATAGGATGGGTTCCCATTGGCTCTGTAGAGGTCGAGAAAGTGAAGAGAGCTGGAG 

AAATCCTGAGTGACAGGAAGTATCGCCAGCCTGCAGACCAGCTCAAATTCACATGCATTACCG 

AC ACTCCGG AAATTGTC CT AGC AAAG AAT AATGC C CTG AC AATG AG C AAGC AT TT AT A C AC AG 

AAGCTTGGGATGCTGACAAAACCTCCATCCACGTGATGCCAGACACCCCAGATATCCTGCTGG 

CC AAG AGT AATTC TG CC AAT ATC AGCC AAAAACT TT AC ACCAAGGG ATGGG ATG AATC AAAG A 

TG AAG GA CT ATG ATCTG AG AGC AG ATG CT ATTTC C AT C AAAAGTG C C AAG G C C TC C AGGG AC A 

TCGCCAGTGACTACAAATACAAGGAAGCCTATGAGAAACAGAAAGGCCACCACATTGGAGCCC 

AGAGCATTGAAGATGATCCCAAGATTATGTGTGCCATACATGCAGAAAAAATTCAAAGTGAAA 

GGGAGTACAAGAAGGAATTCCAAAAGTGGAAAACCAAGTTCTCTAGCCCAGTGGACATGTTAA 

GCATCTTGCTGGCCAAGAAATGTCAGACTTTGGTCACTGACATTTATTATCGCAATTACCTGC 

ATGAATGGACATGCATGCCGGATCAAAACGACATTATCCAAGCAAAAAAGGCCTATGACCTGC 

AGAGTGATGCCCTCTACAAGGCTGACTTGGAGTGGTTGCGTGGCATTGGCTGGATGCCCCAAG 

GGTCTCCTGAAGTGTTGAGAGTCAAAAACGCCCAGAATATCTTTTGTGACAGTGTCTATCGGA 

CG CCTGTGGTG AAC CTTAAGTAC ACAAGC ATTGTTG AC ACAC CTG AAGTGGT C C T TGC T AAAT 

CAAATGCTGAAAATATTAGTATTCCAAAGTACAGAGAGGTTTGGGACAAGGATAAAACTTCAA 

TACACATAATGCCAGATACTCCAGAAATTAATCTCGCTAGAGCAAATGCTCTTAATGTGAGCA 

ATAAACTTTACCGTGAGGGCTGGGATGAAATGAAGGCGGGCTGTGATGTCCGGCTGGATGCCA 

TCCCCATCCAGGCTGCCAAGGCCTCCAGGGAGATTGCCAGTGACTATAAATATAAGCTTGACC 

ATGAGAAGCAGAAGGGACACTACGTGGGCACCCTCACAGCCAGGGATGACAACAAGATCCGCT 

GGGCCCTCATAGCTGACAAGCTCCAGAATGAACGAGAGTACCGGCTGGACTGGGCCAAATGGA 

AGGCCAAGATCCAGAGCCCTGTGGACATGCTTTCCATCCTGCACTCTAAAAATTCCCAGGCTC 

TGGTCAGTGACATGGATTACCGCAATTACCTGCACCAGTGGACCTGCATGCCCGACCAGAACG 

ATGTG AT T C AGGC C AAG AAGG CCT ACG AACTGCAG AG CG AT AATG TTT AC AAG G CTG ACTTGG 

AATGGTTGCGTGGAATTGGGTGGATGCCAAATGA CTC CGTGTCCGTCAATCATGCC AAAC ATG 

CCGCGGACATCTTCAGTGAGAAAAAATATCGCACAAAAATAGAAACTCTCAACTTTACGCCTG 

TGG ATG AC AG AGTTG ATT ATGTG AC AG CG AAAC AAAGTGG CG AG ATCCTCG ATG A T AT T AAAT 

ACCGGAAAGACTGGAATGCCACCAAATCAAAGTACACCCTCACAGAAACCCCCCTGCTGCACA 

CTGCCCAGGAGGCTGCTAGGATACTGGACCAGTATCTCTACAAGGAAGGCTGGGAGAGACAAA 

AAGCCACAGGTTACATTTTGCCTCCAGATGCTGTGCCATTTGTTCATGCCCATCACTGCAATG 

ACGTTC AG AGTG AG CTG AAAT AC AAAGCTG AAC ATGTG AAGC AAAAAGGTC ATT ATGTTGGTG 
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! TCCCGACGATGAGAGATGATCCTAAGCTGGTTTGGTTTGAGCATGCAGGCCAGATTCAGAATG 
j AGAGACTATACAAAGAGGACTATCACAAAACAAAGGCCAAAATCAATATACCTGCTGATATGG 
TGTCAGTCTTGGCCGCCAAGCAGGGGCAGACCCTTGTCAGTGATATTGATTATCGTAATTACT 
TGCACCAATGGATGTGTCATCCTGACCAGAACGATGTTATTCAGGCAAGAAAGGCCTATGACC 
TACAGAGTGATAATGTCTACAGAGCTGACCTGGAGTGGCTCCGAGGCATTGGCTGGATCCCAC 
TGGATTCTGTGGACCATGTAAGGGTTACTAAGAACCAGGAAATGATGAGTCAGATCAAATATA 
AGAAAAATGCCCTTGAAAACTATCCTAACTTTACAAGTGTGGTGGATCCTCCAGAGATTGTTT 
TAGCCAAGATTAATTCTGTCAATCAAAGTGATGTAAAATATAAAGAAACATTTAATAAAGCAA 
AGGGCAAATATACGTTTTCACCAGATACACCACATATCTCCCACTCCAAAGACATGGGAAAAC 
TCTACAGTACTATACTGTATAAAGGGGCGTGGGAGGGCACCAAGGCCTATGGCTACACCCTGG 
ATGAGCGCTACATTCCCATTGTTGGAGCCAAGCATGCTGATCTGGTGAACAGTGAGCTTAAAT 
ACAAAGAGACATATGAGAAGCAGAAAGGTCACTACCTGGCTGGAAAAGTGATCGGTGAATTCC 
CTGGTGTGGTTCACTGTCTGGATTTCCAAAAGATGAGGAGTGCGTTGAACTACAGAAAACATT 
ATGAGGATACCAAAGCAAATGTTCATATCCCCAATGACATGATGAATCACGTGCTGGCTAAAA 
GGTGCCAGTACATCCTCAGTGACCTGGAGTATCGACACTATTTCCACCAGTGGACGTCTCTTC 
TGGAAGAACCCAATGTTATACGCGTCCGAAACGCCCAGGAGATCTTGAGTGATAATGTGTATA 
AAGATGACCTGAATTGGTTGAAAGGCATTGGTTGCTACGTTTGGGATACACCCCAAATCCTCC 
ATGCCAAGAAATCATACGACCTTCAGAGTCAGCTACAATATACAGCAGCAGGTAAAGAAAATC 
T ACAAAACTAT AATC TGGTC AC AG AC ACG CCC CT CT ATGTG AC TGCTGTTC AG AG TGG CATT A 
ATGCCAGTGAGGTAAAATATAAAGAAAATTATCATCAGATTAAGGACAAATACACAACAGTTC 
TAGAAACAGTGGATTATGACAGAACCAGAAACCTGAAGAATCTTTACAGCAGTAACCTGTACA 
AGGAGGCCTGGGATAGAGTGAAAGCCACCAGCTACATCCTGCCTTCCAGCACCTTGTCCCTGA 
CACACGCCAAGAACCAGAAGCATCTGGCCAGCCATATCAAATATCGGGAAGAATATGAAAAGT 
TCAAAGCTCTTTATACGTTACCAAGAAGTGTTGACGATGATCCGAACACAGCACGGTGCCTCC 
G AGTTGG CAAGCTT AAC ATCG AT CGCCTG T AC AG AT C AGTTT ATG AAAAG AAC AAG ATG AAAA 
; TCCACATCGTGCCCGACATGGTAGAGATGGTTACTGCCAAGGATTCCCAGAAGAAAGTCAGTG 
AGATTGATTACCGCCTGCGCCTCCACGAATGGATTTGCCACCCCGACTTGCAAGTCA^-TGATC 
ACGTCAGGAAAGT CACAG ATC AG ATCAG CG ATATTGTATACAAGG ATGACCTC AACTGGCTG A 
AAGGCATTGGTTGCTACGTCTGGGACACTCCTGAAATCCTCCATGCCAAGCATGCTTATGATC 
TACGTGATGATATCAAGTATAAAGCTCACATGTTGAAAACAAGGAATGACTACAAGCTTGTCA 
CAGATACACCAGTCTACGTGCAGGCTGTCAAAAGTGGGAAACAGCTAAGTGACGCTGTCTACC 
ACTATGACTATGTGCACAGTGTCAGAGGCAAAGTGGCTCCAACTACCAAAACCGTGGATCTGG 
ACCGGGCCCTTCATGCATACAAGCTCCAGAGTTCGAATCTATACAAAACCAGCCTGCGCACCC 
TGCCCACTGGATATAGACTTCCAGGTGACACTCCTCACTTCAAACACATCAAGGACACCCGTT 
ACATGAGCAGTTATTTCAAGTACAAAGAAGCCTATGAACACACCAAGGCATATGGGTATACAC 
TTGGCCCCAAAGATGTTCCATTTGTCCACGTCCGGAGAGTCAACAATGTTACCAGCGAGAGAC 
TGTATCGGGAATTGTACCACAAACTGAAAGACAAGATCCATACAACTCCCGATCCCCCTGAGA 
TCCGCC AAGTC AAG AAG AC AC AAG AGGCTGT C AGTG AGTTG ATCT AC AAAT C AG A CTT C T T C A 
AG ATGC AGGGCC AC ATG ATCTCT CTG CCAT AC AC AC C C C AAG TG ATC CAT TG C CG CT ATGTGG 
GAGACATCACCAGTGATATTAAATACAAAGAGGACTTGCAGGTCCTGAAGGGATTTGGCTGCT 
TCCTGTATGACACTCCTGACATGGTCCGCTCCCGGCACCTGCGGAAGCTCTGGTCTAATTACC 
TATACACTGATAAGGCAAGGGAGATGCGAGACAAATACAAAGTGGTGCTTGACACTCCAGAAT 
ACAGAAAAGTGCAAGAACTGAAGACACATCTGAGTGAGCTGGTCTACAGAGCTGCAGGCAAGA 
AGCAGAAGTCAATCTTTACTTCAGTTCCTGATACTCCTGATCTTTTAAGAGCCAAGCGAGGGC 
AGAAGCTTCAGAGTCAGTATCTGTATGTTGAACTTGCCACCAAAGAGAGACCCCATCATCACG 
CTGGAAACCAGACCACAGCCTTGAAGCATGCTAAAGACGTGAAGGACATGGTCAGTGAGAAAA 
AGTACAAGATTCAATATGAAAAGATGAAAGACAAGTACACTCCGGTTCCAGATACGCCAATCC 
TCATCAGAGCCAAGAGGGCTTACTGGAATGCCAGTGATCTACGCTACAAAGAAACATTTCAAA 
AGAC C AAAGGG AAAT AC C AC ACGG TG AAAG ATG C CCT AG AC ATTGTCT AT CAT CG C AAAG T C A 
CAGATGACATCAGTAAAATAAAATACAAGGAGAACTACATGAGCCAGTTGGGTATCTGGAGGT 
CCATTCCTGATCGTCCAGAGCATTTCCACCACCGAGCAGTCACTGACACAGTCAGTGATGTAA 
AAT AT AAAG AAG ACTTG A CTTGGCTT AAAGG C ATTGGTTG CT ATG CCT ATG AT AC C CCTG AT T 
TCACTCTGGCTGAAAAGAACAAGACTCTCTACAGCAAGTATAAGTATAAAGAAGTATTTGAAA 
GGACAAAGTCAGATTTCAAGTATGTTGCCGACTCTCCGATCAATAGGCATTTCAAGTATGCAA 
CTCAATTGATGAATGAGAAAAAATACAGAGCTGATTATGAGCAGCGGAAAGATAAATACCACC 
TGGTAGTCGATGAGCCTAGACATCTGCTGGCTAAGACCCGCAGCGACCAGATCAGTCAGATCA 
AATACAGGAAAAACTATGAAAAATCAAAGGACAAATTTACCTCAATTGTGGATACTCCAGAAC 
ACCTGCGTACTACAAAAGTCAACAAACAAATCAGCGATATCCTTTATAAATTGGAATACAACA 
AGGCC AAACC C AG AGGCTAC ACC AC AATC C ACG AC ACG CCC ATGT TG CTG C ATGTC CGC AAGG 
TT AAAG ATG AAGTC AGTG ATCTG AAAT AC AAAG AAGT AT AC C AAAG AAAT AAATC C AAC TG C A 
CCATTGAGCCAGATGCTGTTCATATCAAAGCAGCCAAGGACGCCTACAAAGTCAACACCAATC 
TGG ACT AT AAG AAAC AGT ACG AAG CC AAC AAAG C C C ACTGG AAGTGG ACTCCTG ACCG ACCG G 
ACTTCCTCCAGGCTGCCAAGTCATCCCTGCAGCAAAGCGATTTTGAATATAAGCTGGACCGGG 
AGTTCCTCAAGGGTTGCAAGCTTTCTGTCACTGATGACAAAAACACGGTGCTCGCCCTCAGGA 
ATACTTTAATAGAAAGTGATCTGAAATACAAAGAGAAACATGTCAAGGAAAGAGGAACCTGCC 
ATG C CGT ACCTG ACACG CCTC AG ATC CTG CTGG CG AAG ACTG TC AGC AAC CTGGTGT CTG AG A 



222 



WO 03/023002 PCT/US02/28539 



i 


ACAAGTACAAGGACCATGTCAAGAAGCACTTGGCACAGGGCTCATACACAACACTACCAGAGA 

CCCGGGACACTGTTCACGTCAAGGAAGTGACCAAGCATGTCAGTGATACAAATTACAAAAAGA 

AGTTTGTCAAGGAGAAAGGAAAATCCAACTACTCCATCATGCTGGAGCCACCAGAGGTGAAAC 

ATGCTATGGAAGTGGCCAAGAAGCAAAGTGATGTCGCTTACAGAAAAGATGCCAAAGAGAACC 

TGCATTACACCACAGTGGCTGATCGACCAGACATCAAGAAGGCCACACAGGCAGCCAAACAGG 

CCAGTGAGGTGGAGTACAGAGCCAAGCACCGCAAGGAAGGCAGCCATGGCTTAAGCATGCTCG 

GTCGCCCAGACATAGAAATGGCCAAGAAGGCAGCCAAGCTGAGCAGCCAGGTTAAATACCGAG 

AAAATTTCGATAAAGAAAAGGGCAAGACACCAAAATACAATCCAAAAGACAGCCAGCTCTACA 

AAGTCATGAAAGATGCTAATAATCTTGCAAGTGAGGTTAAATACAAGGCTGACCTGAAGAAAC 

TT C AC AAACCCGTG ACTG AC ATG AAGG AG TCT CTG AT C ATG AATC ATGTC CTG AAT AC AAG C C 

AACTT GC C AGTTCTT ACC AGT AC AAG AAG AAGTATG AG AAG AGTAAAGGC C AC T AC C AC AC C A 

TACCCGATAATCTGGAGCAGCTTCACCTAAAAGAGGCCACAGAATTACAGAGTATAGTGAAAT 

ACAAAGAAAAGTATGAAAAGGAACGAGGAAAACCCATGCTGGACTTTGAAACACCAACGTACA 

TCACTGCCAAAGAGTCTCAGCAGATGCAGAGTGGGAAAGAATATAGGAAAGATTATGAAGAGT 

CCATTAAAGGCAGAAACCTGACTGGCCTGGAGGTCACGCCAGCTTTGTTACATGTCAAATATG 

CAACTAAAATAGCAAGCGAGAAAGAGTACAGGAAAGATCTAGAGGAAAGCATCCGTGGGAAGG 

GCCTCACTGAAATGGAAGATACACCTGACATGCTAAGAGCAAAGAATGCCACTCAAATCCTCA 

ATGAGAAAGAATATAAGCGAGACCTGGAACTGGAAGTCAAAGGAAGAGGCCTGAATGCCATGG 

CCAATGAAACTCCGGATTTTATGAGGGCCAGGAATGCTACTGATATTGCCAGTCAGATTAAGT 

AT AAG CAATC AGCAGAAATGG AG AAAGCCAATTT CACTT CTGTGGTTGAT ACTC C AG AG AT C A 

TTCATGCCCAACAAGTCAAGAATCTTTCAAGCCAGAAAAAGTACAAGGAAGATGCTGAGAAGT 

CCATGTCGTATTATGAGACTGTTTTGGACACCCCAGAGATACAGAGAGTCCGGGAGAACCAAA 

AGAACTTCAGCCTTCTCCAATACCAGTGTGACCTTAAAAACAGTAAAGGAAAAATTACAGTTG 

TTCAAGACACGCCAGAAATACTGCGTGTAAAAGAAAATCAGAAGAATTTCAGCTCGGTTTTAT 

ATAAAGAGGATGTCTCACCAGGAACGGCTATCGGAAAGACACCTGAGATGATGAGAGTGAAAC 

AAACACAGG ACC AC ATT AGCT CGGTG AAG TAT AAGG AAG CAATAGGAC AAGG AA CTC C AAT C C 

CTGACCTGCCTGAAGTGAAACGTGTGAAGGAGACGCAGAAGCACATTAGCTCGGTTATGTACA 

AAGAAAACTTGGGAACAGGCATTCCAACCACTGTGACTCCAGAGATTGAGAGAGTCAAACGCA 

ATCAAGAGAACTTTAGCTCGGTTTTGTACAAAGAAAATTTGGGGAAAGGAATCCCAACACCTA 

TCACTCCAGAGATGGAGAGAGTCAAACGCAATCAAGAGAACTTTAGCTCGGTGTTATACAAAG 

AAAAC ATGGGCAAGGGAACTC C T TT ACCTGTC AC TCCCG AG ATGG AG CGAGT C AAAC AC AAT C 

AAG AAAATAT TAG CTCGGTTTTGTAC AAAG AAAATGTGGGG AAAG C C ACCGC AACC C C TGT C A 

CTCCTGAGATGCAGAGAGTCAAACGCAATCAAGAAAACATTAGCTCGGTGTTATACAAAGAGA 

ACCTGGGGAAAGCAACCCCCACACCCTTTACTCCTGAGATGGAAAGAGTGAAACGCAATCAAG 

AAAACTTTAGCTCGGTATTGTACAAAGAGAACATGAGAAAAGCAACTCCGACACCTGTTACTC 

CAGAGATGGAGAGAGCTAAGCGCAACCAAGAAAACATTAGCTCGGTTCTTTATTCTGATAGTT 

TCCGGAAACAAATACAAGGCAAAGCTGCCTATGTATTGGATACCCCCGAGATGAGACGGGTGA 

GGGAGACCCAACGGCACATCTCAACGGTGAAATATCATGAAGACTTTGAGAAACACAAGGGTT 

GCTTCACACCAGTGGTGACAGATCCTATCACTGAACGAGTAAAGAAGAACATGCAGGACTTCA 

GTGACATTAACTACCGAGGTATTCAGAGGAAAGTGGTAGAAATGGAACAAAAACGGAATGACC 

AAGATCAGGAGACTATTACAGGTTTACGTGTCTGGCGTACTAATCCTGGTTCGGTTTTTGACT 

ATGATCCAGCAGAAGACAACATCCAGTCCCGAAGCTTACACATGATTAATGTCCAAGCTCAGC 

GCCGGAGCCGGGAGCAGTCACGATCTGCCAGTGCACTAAGCGTCAGTGGGGGTGAGGAGAAGT 

CTGAGCATTCAGAAGCACCAGACCACCACCTTTCGACTTACAGCGACGGGGGTGTCTTTGCAG 

TC TCAAC AGCTT AC AAAC ATG C AAAAACC AC AG AG CTCCC AC AACAACG ATC AT CTT C AG T TG 

CT ACCCAAC AG AC AACGGTATCTTC C ATC CC ATC TC ATCC AT CTACTG CTGG AAAAAT CTT C C 

GTGCCATGTATGACTATATGGCTGCTGATGCAGATGAGGTGTCCTTCAAGGATGGAGATGCCA 

TCATAAATGTTCAAGCAATTGATGAAGGCTGGATGTATGGCACTGTGCAGAGGACTGGCAGGA 

CCGGAATGCTCCCAGCCAACTACGTTGAAGCTATTTAGGCATTTCAAAGCATCACACTTGTCT 

GCAGGACTTACAGATCCTGCAGTCAATGTTTCGGTTTAGACTCTCCACTGTTACCTAAGTTCT 


CAAGCTGCCTATGGTTTTTCTGTGTCAATGTGATTTATGGTAGTACCATCCTTTCTCCTTTGG 


GTTTTAAAATAAGTTGCAGAACAGACACTTTAAAAGCTTCTGCAATATTATTTCTGTGCCTAG 


AGTCTTTCTCCATTATAAACATGTTTTAACATTATTTCTTTTCTAAAACAGGGATTTTGAATA 


TGCCAAACACATTAAAGGAAAAATAGCAGAGATGTTCACCTTTTCCTTGCTGATTGCTAATGC 


1 

1 


TTATTATTTCTAATTCAGTTCTGAAGTTATAAACTTATAATCAATACAAACCAGCAACTAATA 


AAAC CTC TAATTCTGCAAAAAAAAAAAA 


r 1 1 1 


ORF Start: ATG at 441 


jORF Stop: TAG at 20448 




SEQ ID NO: 142 


6669 aa MWatkD 


NOV35b, 

CGI 19566-02 Protein 
Sequence 


MADDEDYEEWEYYTEEWYEEVPGETITKIYETTTTRTSDYEQSETSKPALAQPALAQPASA 
KPVERRKVIRKKVDPSKFMTPYIAHSQKMQDLFS PNKYKEKFEKTKGQPYASTTDTPELRRI K 
KVQDQLSEVKYRMDGDVAKTICHVDEKAKDIEHAKKVSQQVS KVLYKQNWEDTKDKYLLPPDA 
PELVQAVKNTAMFSKKLYTEDWEADKSLFYPYNDSPELRRVAQAQKALSDVAYKKGLAEOQAQ 
FTPLADPPDIEFAKKVTNQVSKQKYKEDYENKIKGKWSETPCFEVANARMNADNISTRKYOED 
FENMKDQIYFMQTETPEYKMNKKAGVAASKVKYKEDYEKNKGKADYNVLPASENPQLRQLKAA 
GDALSDKLYKENYEKTKAKSINYCETPKFKLDTVLQNFSSDKKYKDSYLKDILGHYVGSFEDP 
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YHSHCMKVTAQNSDKNYKAEYEEDRGKGFFPQTITQEYEAIKKLDQCKDHTYKVHPDKTKFTQ 
VTDSPVLLQAQVNSKQLSDLNYKAKHESEKFKCHIPPDTPAFIQHKVNAYNLSDNLYKQDWEK 
SKAKKFDIKVDAIPLLAAKANTKNTSDVMYKKDYEKNKGKMIGVLSINDDPKMLHSLKVAKNQ 
SDRLYKENYEKTKAKSMNYCETPKYQLDTQLKNFSEARYKDLYVKDVLGHYVGSMEDPYHTHC 
MKVAAQNSDKSYKAEYEEDKGKCYFPQTITQEYDAIKKLDQCKDHTYKVHPDKTKFTAVTDSP 
VLLQAQLNTKQLSDLNYKAKHEGERFKCHIPADAPQFIQHRVNAYNLSDNVYKQDWEKSKAKK 
FDIKVDAIPLLAA KANT KNTSDVMYKKDYEKSKGKMI GALS INDDPKMLHSLKTAKNQSDREY 
RKDYEKSKTIYTAPLDMLQVTQAKKSQAIASDVDYKHILHSYSYPPDSINVDLAKKAYALQSD 
VEYKADYNSWMKGCGWVPFGSLEMEKAKRASDILNEKKYRQHPDTLKFTSIEDAPITVQSKIN 
QAQRSDIAYKAKGEEIIHNYNLPPDLPQFIQAKVNAYNISENMYKADLKDLSKKGYDLRTDAI 
PIRAAKAARQAASDVQYKKDYEKAKGKMVGFQSLQDDPKLVHYMNVAKIQSDREYKKDYEKTK 
SKYOTPHDMFl^AAKKAQDWSNVNYKHSLHH^^ 

| NWMKGIGWIPIGSLDVEKVKKAGDALNEKKYRQHPDTLKFTSIVDSPVMVQAKQNTKQVSDIL 
Y KAKGEDVKH KYTMS PDL PQFLQAKCNAYS I SDVCY KRDWHDL I RKGNNVLGDAI P I TAAKAS 
RNIASDYKYKFJ\YEKSKGKHVGFRSLQDDPKLVHYMNVAKLQSDREYKKNYENTKTSYHTPGD 
MVTITAAKMAQDVATNVNYKQPLHHYTYLPDAMSLEHTRNVNQIQSDNVYKDEYNSFLKGIGW 
. IPIGSLEVEKVKKAGDALNERKYRQHPDTVKFTSVPDSMGMMLAQHNTKQLSDLNYKVEGEKL 
KHKYTIDPELPQFIQAKVNALNMSDAHYKADWKKTIRKGYDLRPDAIPIVAAKSSRNIASDCK 
YKEAYEKAKGKQVGFLSLQDDPKLVHYMNVAKIQSDREYKKGYEASKTKYHTPLDMVSVTAAK 
KSQEVATNANYRQSYHHYTLLPDALNVEHSRNAMQIQSDNLYKSDFTNWMKGIGWVPIESLEV 
EKAKKAGEILSEKKYRQHPEKLKFTYAMDTMEQALNKSNKLNMDKRLYTEKWNKDKTTIHW1P 
DTPDILLSRVNQITMSDKLYKAGWEEEKKKGYDLRPDAIAIKAARASRDIASDYKYKKAYEQA 
KGKHIGFRSLEDDPKLVHFMQVAKMQSDREYKKGYEKSKTSFHTPVDMLSWAAKKSQEVATN 
ANYRNVIHTYNMLPDAMSFELAKNMMQIQSDNQYKADYADFMKGIGWLPLGSLEAEKNKKAME 
IISEKKYRQH PDTL K YST LM D S MNM VL AQNN AK I MNE HL Y KQ AW E AD KT K VH IMPDIPQII LA 
KANAINISDKLYKLSLEESKKKGYDLRPDAIPIKAAKASRDIASDYKYKYNYEKGKGKMVGFR 
SLEDDPKLVHSMQVAKMQSDREYKKNYENTKTSYHTPADMLSVTAAKDAQANITNTNYKHLIH 
KYILLPDAMNIELTRNMNRIQSDNEYKQDYNEWYKGLGWSPAGSLEVEKAKKATEYASDQKYR 
QHPSNFQFKKLTDSMDMVLAKQNAHTMNKHLYTIDWNKDKTKIHVMPDTPDILQAKQNQTLYS 
QKLYKLGWEEALKKGYDLPVDAISVQLAKASRDIASDYKYKQGYRKQLGHHVGFRSLQDDPKL 
VLSMOTAKMQSEREYKKDFEKWKTKFSSPVDMLGVVLAKKCQELVSDVDYKNYLHQWTCLPDQ 
NDWQAKKVYELQSENLYKSDLEWLRGIGWSPLGSLEAEKNKRASEIISEKKYRQPPDRNKFT 
SIPDAMDIVLAKTNAKNRSDRLYREAWDKDKTQIHIMPDTPDIVLAKANLINTSDKLYRMGYE 
ELKRKGYDLPVDAIPIKAAKASREIASEYKYKEGFRKQLGHHIGARNIEDDPKMMWSMHVAKI 
QSDREYKKDFEKWKTKFSSPVDMLGWLAYKCQTLVSDVDYKNYLHQWTCLPDQSDVIHARQA 
YDLQSDNLYKSDLQWLKGIGWMTSGSLEDEKNKRATQILSDHVYRQHPDQFKFSSLMDSIPMV 
LAKNNAITMNHRLYTEAWDKDKTTVHIMPDTPEVLLAKQNKVNYSEKLYKLGLEEAKRKGYDM 
RVDAIPIKAAKASRDIASEFKYKEGYRKQLGHHIGARAIRDDPKMMWSMHVAKIQSDREYKKD 

j FEKWKTKFSSPVDMLGWLAKKCQTLVSDVDYKNYLHQWTCLPDQSDVIHARQAYDLQSDNMY 
KSDLQWMRGIGWVSIGSLDVEKCKRATEILSDKI YRQPPDRFKFTSVTDSLEQVLAKNNALNM 
NKRLYTEAWDKDKTQIHIMPDTPEIMLARQNKINYSETLYKLANEEAKKKGYDLRSDAIPIVA 
AKASRDVISDYKYKDGYRKQLGHHIGARNIEDDPKMMWSMHVAKIQSDREYKKDFEKWKTKFS 
SPVDMLGWLAKKCQTLVSDVDYKNYLHEWTCLPDQNDVIHARQAYDLQSDNIYKSDLQWLRG 
IGWVPIGSMDWKCKRAAEILSDNIYRQPPDKLKFTSVTDSLEQVLAKNNALNMNKRLYTEAW 
DKDKTQVHIMPDTPEIMLARQNKINYSESLYRQAMEEAKKEGYDLRSDAIPIVAAKASRDIAS 
DYKYKEAYRKQLGHHIGARAVHDDPKIMWSLHIAKVQSDREYKKDFEKYKTRYSSPVDMLGIV 
LAKKCQTLVSDVDYKHPLHECICLPDQNDIIHARKAYDLQSDNLYKSDLEWMKGIGWVPIDSL 

j EWRAKRAGELLSDTIYRQRPETLKFTSITDTPEQVLAKNNALNMNKRLYTEAWDNDKKTIHV 
MPDTPEIMLAKLNRINYSDKLYKLALEESKKEGYDLRLDAIPIQAAKASRDIASDYKYKEGYR 
KQLGHHIGARNIKDDPKMMWSIHVAKIQSDREYKKEFEKWKTKFSSPVDMLGVVLAKKCQILV 
SDIDYKHPLHEWTCLPDQNDVIQARKAYDLQSDAIYKSDLEWLRGIGWVPIGSVEVEKVKRAG 
EILSDRKYRQPADQLKFTCITDTPEIVLAKNNALTMSKHLYTEAWDADKTSIHVMPDTPDILL 
AKSNSANISQKLYTKGWDESKMKDYDLRADAISIKSAKASRDIASDYKYKEAYEKQKGHHIGA 

j OSIEDDPKIMCAIHAEKIQSEREYKKEFQKWKTKFSSPVDMLSILLAKKCQTLVTDIYYRNYL 
HEWTCMPDQNDIIQAKKAYDLQSDALYKADLEWLRGIGWMPQGSPEVLRVKNAQNIFCDSVYR 
TPWNLKYTSIVDTPEWLAKSNAENISIPKYREVWDKDKTSIHIMPDTPEINLARANALNVS 
NKLYREGWDEMKAGCDVRLDAIPIQAAKASREIASDYKYKLDHEKOKGHYVGTLTARDDNKIR 
WALIADKLQNEREYRLDWAKWKAKIQSPVDMLSILHSKNSQALVSDMDYRNYLHQWTCMPDQN 
DVIQAKKAYELQSDNVYKADLEWLRGIGWMPNDSVSVNHAKHAADIFSEKKYRTKIETLNFTP 
VD DR VD Y VT AKQ S G E I LDD I K YR KD WN AT K S K YTLTET P LLHT AQ E AAR I LDQ YLYKEGWERQ 
KATGYILPPDAVPFVHAHHCNDVQSELKYKAEHVKQKGHYVGVPTMRDDPKLVWFEHAGQIQN 
ERLYKEDYHKTKAKINIPADMVSVLAAKQGQTLVSDIDYRNYLHQWMCHPDQNDVIQARKAYD 
LQSDNVYRADLEWLRGIGWIPLDSVDHVRVTKNQEMMSQIKYKKNALENYPNFTS\A^DPPEIV 
LAKINSVNQSDVKYKETFNKAKGKYTFSPDTPHISHSKDMGKLYSTILYKGAWEGTKAYGYTL 
DERYIPIVGAKHADLVNSELKYKETYEKOKGHYLAGKVIGEFPGWHCLDFQKMRSALNYRKH 

jYEDTKANVHIPNDMMNHVLAKRCQYILSDLEYRHYFHQWTSLLEEPNVIRVRNAQEILSDNVY 
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,'kddlnwlkgigcy vwdtpqi lhakks ydlqsqlqytaagkenlqnynlvtdtpl y vtavqsg i 
nasevkykenyhqikdkyttvletvdydrtrnlknlyssnlykeawdrvkatsyilpsstlsl 
thaknqkhlashikyreeyekfkalytlprsvdddpntarclrvgklnidrlyrsvyeknkmk 
j i h i vpdmvemvtakdsqkkvseidyrlrlhewi chpdlqvndhvrkvtdqisd i vykddlnwl 
[kgigcyvwdtpeilhakhaydlrddikykahmlktrndyklvtdtpvyvqavksgkqlsdavy 
hydyvhsvrgkvapttktvdldralhayklqssnlyktslrtlptgyrlpgdtphfkhikdtr 
ymssyfkykeayehtkaygytlgpkdvpfvhvrrvnnvtserlyrelyhklkdkihttpdppe 
irqvkktqeavseliyksdffkmqghmislpytpqvihcryvgditsdikykedlqvlkgfgc 
flydtpdmvrsrhlrklwsnylytdkaremrdkykwldtpeyrkvqelkthlselvyraagk 
kqksiftsvpdtpdllrakrgqklqsqylyvelatkerphhhagnqttalkhakdvkdmvsek 
kykiqyekmkdkytpvpdtpilirakraywnasdlryketfqktkgkyhtvkdaldivyhrkv 
tddiskikykenymsolgiwrsipdrpehfhhravtdtvsdvkykedltwlkgigcyaydtpd 
ftlaeknktlyskykykevfertksdfkyvadspinrhfkyatqlmnekkyradyeqrkdkyh 
lvvdeprhllaktrsdqisqi kyrknyekskdkfts i vdtpehlrttkvnkqi sdi lykleyn 
kakprgyttihdtpmllhvrkvkdevsdlkykevyqrnksnctiepdavhikaakdaykvntn 
ldykkqyeankahwkwtpdrpdflqaaksslqqsdfeykldreflkgcklsvtddkntvlalr 
ntliesdlkykekhvkergtchavpdtpqillaktvsnlvsenkykdhvkkhlaqgsyttlpe 
trdt vhvke vtkh vs dtnykk k f vke kg ksnysimlepp e vkhame v ak kqs d v a y r kd a ken 
lhyttvadrpdikkatqaakqaseveyrakhrkegshglsmlgrpdiemakkaaklssqvkyr 
enfdkekgktpkynpkdsqlykvmkdannlasevkykadlkklhkpvtdmkeslimnhvlnts 
qlassyqykkkyekskghyhtipdnleqlhlkeatelqsivkykekyekergkpmldfetpty 
i takesqqmqsgkeyrkdyees i kgrnltglevtpallhvkyatki aseke yrkdlees i rgk 
gltemedtpdmlraknatqilnekeykrdlelevkgrglnamanetpdfmrarnatdiasqik 
ykqsaemekanftsvvdtpeiihaqqvknlssqkkykedaeksmsyyetvldtpeiqrvrenq 
knfsllqyqcdlknskgkitwqdtpeilrvkenqknfssvlykedvspgtaigktpemmrvk 
qtqdhissvkykeaigqgtpipdlpevkrvketqkhissvmykenlgtgipttvtpeiervkr 
nqenfssvlykenlgkgiptpitpemervkrnqenfssvlykenmgkgtplpvtpemervkhn 
qenissvlykenvgkatatpvtpemqrvkrnqenissvlykenlgkatptpftpemervkrnq 
enfssvlykenmrkatptpvtpemerakrnqenissvlysds frkqiqgkaayvldtpemrrv 
retqrhistvkyhedfekhkgcftpwtdpitervkknmqdfsdinyrgiqrkwemeqkrnd 
qdqetitglrvwrtnpgsvfdydpaedniqsrslhminvqaqrrsreqsrsasalsvsggeek 
s e hs e ap dhh ls t ys dgg vfa vs t a y khakttel pqq rs s s vatq qtt vs s i p s h p s tag k i f 
ramydymaadadevsfkdgdaiinvqaidegwmygtvqrtgrtgmlpanyveai 




SEQIDNO: 143 


20194 bp i 


NOV35c, 

CGI 19566-03 DNA 
Sequence 


CCACTACTACTCTGAAAAATGGCAGATGACGAAGACTATGAGGAGGTGGTGGAGTACTACACA 
GAAGAAGTGGTTTACGAAGAGGTGCCGGGAGAGACAATAACAAAAATTTATGAGACTACGACA 
AC AAGG AC AT CTG ACTATG AG C AATCAG AAACTT C C AAAC C AGCT CTGG C AC AG CC AG C AC TG 
GC AC AGCC AG CAT CAG C AAAGCCGGTGG AG AGG AGG AAGGTC AT C CGG AAG AAAG TGG ATC CT 
TCAAAGTTCATGACCCCCTACATTGCACACAGTCAGAAAATGCAGGATCTTTTTAGCCCAAAT 
AAATACAAGGAGAAGTTTGAGAAAACAAAAGGACAGCCATACGCCAGCACAACAGATACTCCA 
GAACTTCGCAGAATCAAAAAAGTACAAGATCAACTCAGTGAGGTTAAGTATCGAATGGATGGT 
GATGTTGCTAAGACTATATGTCACGTAGATGAAAAAGCAAAGGATATTGAACATGCAAAGAAA 
GTGTCGCAGCAAGTCAGTAAGGTTTTATACAAGCAGAACTGGGAAGACACCAAGGATAAGTAC 
CTGCTTCCTCCTGATGCCCCTGAACTTGTCCAGGCCGTTAAGAACACCGCCATGTTCAGCAAG 
AAACTGTACACTGAAGACTGGGAAGCAGACAAAAGTTTGTTTTACCCCTATAATGATAGCCCG 
GAACTGAGGAGAGTTGCCCAGGCCCAGAAAGCTCTCAGTGATGTTGCCTACAAAAAAGGTCTC 
GCTGAACAGCAAGCTCAATTCACGCCTCTGGCCGATCCTCCAGATATAGAATTTGCCAAGAAA 
GTAACCAATCAAGTGAGCAAGCAAAAATACAAAGAAGACTATGAAAATAAAATCAAAGGCAAA 
TG G AG TG AG AC AC CTTGCTTTG AAGTTGC AAATG CC AG AATG AATG C TG AT AAC ATT AG C A CA 
AGGAAATACCAGGAAGATTTTGAAAACATGAAAGACCAGATCTACTTCATGCAGACCGAAACA 
CC AG AGT AT A7VAATG AAT AAAAAAG CTGG TG TGG CAG CT AG C AAGG T AAAAT AC AAAG AAG A C 
TATGAAAAGAATAAAGGAAAAGCAGATTATAATGTGCTTCCTGCTTCAGAGAACCCACAGCTT 
AGGCAGCTGAAGGCAGCAGGAGATGCCCTAAGTGACAAACTATACAAGGAAAACTATGAAAAG 
AC AAAAG C AAAG AGC AT AAATT ACTG CG AG A CCC C C AAATTC AAGCT CG AT A CTGT TCTG C AG 
AACTTCAGT AGTG AT AAAAAAT AT AAAGATTCCT ACT T AAAAG AT AT TTTGGG AC ATT ATG T A 
GGCAGCTTCGAGGATCCATACCATTCACACTGCATGAAAGTCACAGCTCAAAACAGTGATAAA 
AACTACAAAGCAGAATACGAAGAAGACAGAGGCAAAGGCTTCTTCCCTCAGACCATAACTCAA 
GAATATGAAGCAATTAAGAAACTAGATCAGTGTAAAGACCACACCTACAAAGTCCATCCAGAT 
AAGACAAAATTCACCCAAGTTACAGACTCTCCTGTTCTGCTACAAGCCCAAGTCAATTCCAAA 
CAACTGAGTGACTTAAATTACAAAGCAAAACATGAAAGTGAAAAGTTCAAGTGCCATATCCCC 
CCTGATACTCCTGCTTTTATCCAGCACAAAGTCAATGCCTATAACTTGAGTGATAATCTTTAT 
AAGCAAGACTGGGAGAAGAGCAAAGCCAAAAAGTTTGACATTAAAGTGGATGCCATTCCCCTG 
CTGGC AG CC AAAG CC AAC ACC AAG AAC AC CAG CG ATG TG ATG T AC AAG AAAG AC T ATG AAAAA 
AAC AAAGGG AAAATG ATTGG AGTCC T CAG CATT AATG ACG AT CCC AAG ATG CTG C ACT C C T TG 
AAGGTGGCCAAAAACCAGAGTGATAGATTATACAAGGAAAACTATGAGAAGACAAAGGCAAAG 
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AGTATGAATTACTGTGAGACCCCAAAATATCAACTTGATACTCAGCTGAAGAACTTCAGTGAG 

GCTAGATATAAAGACTTATATGTAAAGGATGTTTTGGGACATTATGTAGGCAGCATGGAGGAC 

CCATATCACACACACTGCATGAAAGTTGCAGCTCAAAACAGTGATAAAAGTTACAAAGCAGAA 

TATGAAGAAGATAAAGGAAAATGCTATTTCCCTCAGACAATAACACAAGAATATGACGCAATC 

AAGAAGCTGGACCAGTGTAAAGATCATACCTACAAAGTTCATCCAGATAAGACCAAATTCACG 

GCAGTCACTGATTCTCCTGTACTGTTGCAAGCCCAGCTCAACACGAAACAGCTTAGTGATCTG 

AATTACAAAGCAAAACATGAAGGTGAGAGGTTCAAGTGCCATATACCAGCAGATGCTCCACAG 

TTTATCCAACACAGAGTCAATGCCTATAATCTGAGTGATAATGTTTATAAGCAAGACTGGGAG 

AAG AG C AAAGCCAAG AAGTTTG A C ATTAAAGTGG A CG CC ATT CCCCTGTTGGC AG C C AAAG C C 

AACACCAAGAACACCAGCGATGTGATGTACAAGAAAGACTATGAAAAGAGCAAAGGGAAAATG 

ATTGGAGCCCTCAGCATTAATGACGATCCAAAGATGCTGCACTCCTTGAAGACAGCCAAAAAC 

CAGAGTGATCGCGAATATCGAAAAGATTATGAAAAGTCAAAAACTATCTACACGGCACCTCTT 

GATATGCTCCAAGTCACTCAAGCTAAGAAATCTCAGGCAATTGCCAGCGACGTTGATTATAAG 

CACATCTTACACAGTTACAGCTACCCCCCTGATAGCATCAATGTGGACCTTGCCAAGAAGGCA 

TATGCGCTGCAGAGCGATGTTGAATACAAAGCTGACTACAATAGCTGGATGAAAGGTTGTGGC 

TGGGTGCCTTTTGGGTCCTTAGAAATGGAAAAGGCAAAGCGAGCTTCAGACATCCTCAATGAG 

AAAAAATATCGCCAACATCCAGACACCCTCAAGTTTACCTCGATTGAAGATGCTCCAATTACA 

GTACAGTCTAAAATTAACCAGGCCCAGAGGAGTGATATCGCTTACAAAGCCAAAGGAGAGGAA 

ATTATTCACAATTACAACCTGCCACCAGACCTGCCCCAGTTCATCCAGGCTAAAGTTAATGCC 

TACAATATCAGTGAGAATATGTACAAAGCAGACTTGAAAGACTTGAGCAAGAAGGGATATGAC 

CTGAGAACTGATGCGATTCCCATCAGAGCTGCCAAAGCTGCCAGGCAGGCGGCGAGTGACGTT 

CAGTACAAAAAAGACTATGAAAAGGCTAAAGGGAAAATGGTTGGCTTCCAAAGTCTTCAAGAT 

GACCCTAAACTGGTTCATTATATGAACGTGGCCAAGATACAATCAGATCGGGAGTATAAAAAA 

GACTATGAGAAGACAAAGTCCAAATACAACACGCCCCATGATATGTTCAATGTCGTGGCGGCT 

AAGAAAGCCCAGGATGTGGTCAGCAATGTCAACTATAAGCATTCTCTCCATCATTACACCTAC 

TTGCCTGACGCCATGGACCTGGAGCTGTCTAAGAACATGATGCAGATACAGAGTGATAACGTC 

TACAAGGAAGACTACAACAACTGGATGAAAGGCATTGGCTGGATTCCTATTGGCAGTCTCGAC 

GTCGAAAAAGTTAAAAAGGCCGGTGATGCTCTGAATGAAAAGAAGTACAGGCAACATCCAGAC 

ACCCTCAAATTTACCAGCATTGTGGACTCCCCAGTTATGGTCCAGGCAAAACAGAACACGAAG 

CAAGTCAGTGATATCTTATACAAGGCTAAAGGAGAAGATGTGAAACATAAATACACCATGAGT 

CCTGATCTTCCTCAGTTTCTCCAGGCCAAGTGCAATGCTTACAGTATAAGTGACGTCTGTTAT 

AAACGGGATTGGCATGACTTAATACGCAAGGGCAACAATGTGCTGGGCGATGCTATTCCCATC 

ACTGCAGCCAAGGCATCGAGAAACATTGCCAGTGATTATAAATACAAGGAAGCTTATGAGAAG 

TCAAAGGGAAAGCATGTGGGTTTCAGAAGCCTCCAGGATGATCCCAAGCTGGTCCACTATATG 

AATGTGG C AAAGC TG C AGTCTG ATCGTG AAT AC AAG AAG AACT ATG AG AAC AC CAAAA CC AG C 

TACCATACCCCTGGGGACATGGTTACGATCACAGCTGCAAAGATGGCCCAGGATGTCGCTACC 

AATGTCAACTACAAACAGCCATTGCATCATTACACATACCTACCTGACGCCATGAGTCTTGAG 

CATACGAGGAATGTCAATCAAATTCAGAGTGATAATGTGTATAAAGACGAGTATAACAGCTTC 

TTGAAGGGCATCGGATGGATCCCTATTGGTTCCCTGGAGGTGGAGAAGGTCAAGAAAGCAGGC 

GATGCATTAAATGAGAGGAAGTATCGACAGCACCCAGATACCGTCAAGTTCACAAGTGTGCCT 

G ATTCC ATGG GCATG ATGTTGGC T C AG CAT AAC A C AAAG C AG CTAAGTG AT TTG AACT AC AAG 

GTAGAGGGAGAGAAACTGAAGCACAAGTATACTATTGACCCTGAATTGCCTCAGTTTATTCAA 

GCCAAAGTCAACGCCCTCAACATGAGTGATGCTCATTATAAAGCAGATTGGAAGAAAACCATT 

CGCAAGGGCTATGATTTGAGACCAGATGCCATCCCAATTGTTGCTGCAAAAAGTTCAAGGAAT 

ATTGCTAGTGATTGCAAATATAAGGAGGCCTACGAGAAAGCCAAAGGCAAGCAAGTTGGATTT 

CTCAGTCTTCAGGATGATCCTAAACTGGTTCACTACATGAATGTGGCCAAAATCCAGTCTGAT 

CGTG AGT ACAAAAAGGGCTATG AAG CC AG C AAGACC AAG TACC AC ACACC TCTGG AT ATGGTC 

AGTGTGACAGCTGCAAAGAAATCTCAGGAGGTTGCCACCAACGCCAACTACAGACAGTCATAC 

CACCACTACACTCTCCTG CCCG ATGCCTTG AATGTGG AGCACTCCAGGAATGCCATGC AG ATT 

CAGAGTGATAATCTGTACAAATCTGACTTCACCAATTGGATGAAAGGGATCGGCTGGGTGCCC 

ATAGAGTCCCTGGAGGTGGAGAAGGCAAAGAAAGCAGGAGAGATTCTTAGTGAGAAGAAGTAT 

CG CC AG C ACCC CG AG AAG CTG AAGTTC ACTT ACG CC ATGG AC AC AATGGAAC A GG C AC T T AAC 

AAGAGTAACAAACTGAACATGGACAAGAGGCTCTACACTGAAAAATGGAACAAGGACAAGACC 

ACCATTCATGTCATGCCTGACACACCGGATATTTTACTCTCCAGAGTAAACCAAATCACCATG 

AGTGATAAACTGTACAAAGCTGGCTGGGAAGAGGAAAAGAAGAAAGGATATGACCTGAGGCCT 

GATGCCATTGCAATAAAGGCTGCAAGAGCCTCTAGAGACATTGCCAGTGATTACAAATACAAG 

AAAGC CT ATG AAC AAGC C AAAGGG AAACA C ATTG GCT T C CGG AGCCTGG AAG ATG AC C C C AAG 

CTGGTG C ACTTCATG C AAGTGGC C AAG ATG C AGT CAG AC CGGG AAT AC AAG AAGG G AT ATG AG 

AAATCCAAGACCTCCTTCCACACCCCGGTGGACATGCTCAGTGTGGTGGCAGCCAAGAAGTCT 

CAGGAAGTGGCCACCAATGCCAACTACAGGAACGTGATCCATACCTACAACATGCTTCCTGAT 

GCCATGAGCTTTGAATTGGCCAAAAATATGATGCAGATTCAAAGTGATAATCAGTACAAGGCT 

GACTATGCTGACTTCATGAAGGGCATTGGATGGCTCCCTCTGGGCTCCCTGGAAGCAGAGAAA 

AACAAGAAAGCCATGGAGATTATTAGTGAAAAGAAGTACCGCCAGCACCCAGACACTTTGAAG 

TATTCCACACTCATGGACTCGATGAACATGGTTTTGGCCCAGAATAATGCAAAAATTATGAAC 

GAACATCTCTACAAACAAGCATGGGAGGCTGACAAAACCAAAGTCCACATCATGCCTGATATC 

CCCCAGATTATTTTGGCAAAGGCAAATGCAATTAATATAAGTGATAAACTCTACAAACTTTCC 
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TTGGAAGAGTCTAAAAAGAAAGGCTATGATCTCAGACCTGATGCAATTCCTATCAAAGCTGCC 

AAGGCTTCCAGAGATATTGCAAGTGATTATAAATACAAGTACAATTATGAAAAAGGGAAGGGG 

AAAATGGT TGGTTTCCGC AGTCT CGAGG ATG AT C CC AAATT AGTC C ATTCC ATG C AAGTGG CT 

AAGATGCAATCTGATCGGGAGTACAAGAAAAACTATGAGAACACAAAGACCAGCTACCACACC 

CCTGCCGACATGCTCAGTGTCACGGCTGCAAAGGATGCCCAAGCCAACATCACCAACACTAAC 

TACAAGCACCTGATTCACAAGTACATCCTCCTTCCAGATGCAATGAACATTGAGCTGACCAGG 

AATATGAATCGCATACAGAGTGATAATGAATATAAGCAAGATTACAATGAATGGTACAAAGGG 

CTTGGCTGGAGTCCAGCAGGTTCTCTGGAAGTGGAGAAGGCCAAGAAAGCAACTGAATATGCC 

AGTGATCAGAAATACCGCCAGCACCCGAGCAACTTCCAGTTTAAGAAGCTGACTGATTCCATG 

G ACATGGTGC TTG CC AAG CAG AATG C ACAT ACCATG AAC AAGC ATTTAT AC AC C ATTG ATT GG 

AATAAAGATAAGACCAAGATTCATGTGATGCCTGATACACCAGATATTTTACAAGCCAAGCAG 

AATCAAACACTGTATAGTCAGAAACTCTATAAACTTGGATGGGAAGAAGCTTTGAAGAAAGGC 

TATGATCTCCCAGTTGATGCAATTTCTGTACAGCTAGCTAAAGCTTCAAGAGACATTGCTAGT 

GATTATAAATACAAACAAGGCTACCGAAAGCAACTTGGCCACCATGTTGGATTCCGGAGTCTG 

CAAGATGACCCAAAACTTGTGTTGTCCATGAATGTAGCCAAAATGCAGAGTGAAAGAGAATAC 

AAGAAGGACTTTGAGAAGTGGAAAACTAAGTTCTCCAGCCCAGTGGACATGTTGGGAGTGGTA 

CTGGCCAAGAAGTGTCAGGAGTTGGTTAGTGACGTGGACTACAAGAACTACCTGCATCAGTGG 

ACATGTCTGCCTGATCAGAACGATGTTGTGCAAGCTAAGAAAGTTTATGAACTGCAAAGTGAG 

AATCTATATAAATCTGACCTTGAGTGGCTGAGAGGCATAGGATGGAGTCCCTTGGGTTCTTTA 

GAGGCAGAAAAGAACAAGCGGGCTTCGGAAATCATCAGTGAGAAGAAATATCGTCAGCCTCCA 

GACAGAAACAAGTTCACCAGCATTCCTGATGCCATGGATATAGTTCTGGCAAAGACAAATGCC 

AAAAATAGGAGTGATAGACTTTATAGAGAAGCTTGGGACAAAGACAAGACTCAGATCCACATC 

ATGCCTGATACACCTGACATTGTTCTGGCTAAAGCAAACTTAATCAACACAAGTGATAAACTC 

TACCGAATGGGTTATGAGGAGCTGAAGAGAAAAGGTTACGATCTTCCTGTTGATGCCATACCA 

ATCAAAGCAGCAAAAGCCTCCCGGGAAATTGCCAGTGAATACAAGTACAAGGAAGGCTTTCGC 

AAGCAGCTCGGCCACCACATTGGTGCCCGGAACATTGAAGATGACCCCAAGATGATGTGGTCC 

ATGCATGTGGCCAAGATCCAGAGTGACAGGGAGTACAAGAAGGACTTTGAGAAGTGGAAGACC 

AAGTTCAGCAGCCCAGTGGACATGCTGGGGGTGGTGTTGGCCTATAAGTGCCAGACCTTAGTC 

AGCGACGTGGACTACAAGAACTACCTGCACCAGTGGACATGCCTGCCCGACCAGAGCGATGTC 

ATCCATGCTCGGCAGGCCTATGACCTCCAGAGCGATAATTTGTACAAGTCAGACCTTCAGTGG 

CTAAAAGGCATTGGCTGGATGACTAGTGGTTCTCTCGAGGATGAGAAAAATAAACGAGCCACC 

CAGATTTTGAGTGACCATGTTTACCGTCAGCACCCAGATCAATTTAAGTTTTCCAGCCTTATG 

GATTCCATACCAATGGTTTTGGCAAAAAACAATGCTATTACCATGAATCATCGCCTCTATACA 

GAAGCTTGGGATAAAGATAAAACCACTGTCCACATTATGCCAGATACCCCTGAAGTTTTATTA 

GC T AAAC AAAAC AAAGT AAAT T ACAG TG AG AAATTGT AT AAGCTTGG C CT AG AAG AAG C C AAG 

AGGAAAGGTTATGACATGCGGGTAGATGCCATTCCTATCAAGGCAGCCAAGGCCTCCAGAGAT 

ATTGCAAGTGAATTCAAGTACAAAGAAGGCTATCGTAAGCAGCTCGGCCACCACATTGGTGCC 

CGAGCTATACGTGATGACCCCAAGATGATGTGGTCCATGCACGTGGCCAAGATCCAGAGTGAC 

AGGGAGTACAAGAAGGACTTTGAGAAGTGGAAGACCAAGTTCAGCAGCCCAGTGGACATGCTG 

GGGGTGGTGCTGGCCAAGAAGTGCCAGACCTTAGTCAGCGATGTGGACTACAAGAACTACCTG 

CACCAGTGGACATGCCTGCCCGACCAGAGCGACGTCATCCATGCTCGGCAGGCCTATGACCTC 

CAGAGCGATAATATGTACAAGTCTGATCTCCAGTGGATGAGAGGCATTGGCTGGGTGTCCATT 

GGCTCTTTGGATGTGGAAAAATGCAAAAGGGCAACTGAAATTTTGAGTGATAAAATCTATCGC 

CAGCCTC C AG AC AG ATTC AAATTT AC C AGTG TG ACTG ACTCT CTG G AACAAG T G C TG G CC AAG 

AACAATGCTCTCAACATGAATAAGCGTTTATACACAGAGGCCTGGGACAAAGACAAGACTCAA 

ATTCACATAATGCCTGATACACCAGAGATTATGTTGGCAAGGCAGAACAAAATCAACTACAGT 

GAGACTCTATACAAACTTGCCAATGAAGAAGCAAAAAAGAAAGGCTACGACTTGCGAAGTGAC 

GCCATCCCCATCGTGGCTGCCAAGGCCTCCAGGGACGTTATCAGTGATTACAAATACAAAGAT 

GGTTACCGCAAGCAGCTCGGCCACCACATTGGAGCCCGGAACATTGAAGATGACCCCAAGATG 

ATGTGGTCCATGCATGTGGCCAAGATCCAGAGTGACAGGGAGTATAAGAAGGACTTTGAGAAG 

TGGAAGACCAAGTTCAGCAGCCCAGTGGACATGCTGGGAGTGGTGTTAGCCAAGAAGTGCCAG 

ACCTTAGTCAGCGATGTGGACTACAAGAACTACCTGCACGAGTGGACGTGCCTGCCCGACCAG 

AATGATGTCATCCATGCTCGGCAGGCCTATGACCTCCAGAGCGATAACATTTACAAATCTGAT 

CTCCAGTGGCTGAGAGGCATTGGCTGGGTCCCCATTGGGTCTATGGATGTGGTCAAGTGCAAG 

AGAGCTGCTGAAATACTGAGTGATAACATCTACCGCCAGCCTCCGGACAAGCTGAAATTTACC 

AGTGTGACTGACTCTCTAGAGCAGGTGCTGGCCAAGAACAATGCTCTCAATATGAACAAGCGC 

TT AT AC AC AG AAG CCTGGG AC AAAG AC AAG AC C C AAGT CC AT ATT ATG C CTG AT AC AC CTG AA 

ATC ATGT TGG C AAG AC AAAATAAAAT AAAT TAT AGTG AG AG C CTCT ATCG TC AGG CC ATG G AA 

G AAG C C AAG AAAG AAGG CT ATG ACTTG AG AAGTG ATG CC ATTCCC ATTGTGG CTG CC AAG GC C 

TCTCGGGATATTGCCAGTGATTACAAATACAAAGAAGCATATCGTAAGCAGTTGGGTCACCAC 

ATTGGCGCCCGAGCAGTACACGATGACCCCAAGATAATGTGGTCCCTCCACATTGCCAAAGTG 

CAG AGTG ACCGTGAGTACAAG AAAG ATTTTGAG AAAT AC AAG ACAAGGT ACAG CAGCC C AGTG 

GACATGCTTGGTATCGTTTTGGCCAAGAAGTGTCAGACCTTGGTCAGCGATGTGGACTATAAA 

CATCCTCTGC ATG AATG CATCTGCCTGCCCG AC CAG AATG AC ATC ATTC ATG CACGG AAAG CC 

TATGACCTCCAGAGTGACAATTTGTATAAGTCAGACCTTGAATGGATGAAAGGCATTGGCTGG 

GTTCCG ATTG ATTCCTTGG AAGT TGTT AG GGCC AAG AG AG CTGG AG AATT ACT T AGTG AT ACT 
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ATCTACCGTCAGCGTCCAGAAACGCTGAAATTTACCAGTATAACGGACACTCCGGAGCAGGTG I 
CTGGCAAAAAACAATGCTTTAAACATGAATAAGCGCTTATATACTGAAGCCTGGGACAATGAC 
AAGAAAACTATTCATGTCATGCCTGATACACCAGAAATCATGTTAGCCAAACTCAACCGAATA 
AACTACAGTGATAAACTCTATAAACTTGCTTTGGAAGAGTCCAAGAAGGAAGGCTATGACTTG 
CG TCTGG ATG CC ATTCC AAT CC AAG C AGC C AAGG CTT C AAG AG AT ATTG CT AGTG AT T AC AAG 
TACAAGGAAGGCTACCGCAAACAGCTTGGCCACCATATTGGGGCCCGGAACATTAAGGATGAC 
CCGAAGATGATGTGGTCCATCCATGTGGCCAAGATCCAGAGTGACAGGGAGTACAAGAAGGAG 
TTTGAGAAGTGGAAGACCAAGTTCAGCAGCCCAGTGGACATGCTGGGGGTGGTGCTGGCCAAG 
AAGTG TC AG ATCCTTGT AAGCG AC ATAG ACT AC AAG C ATCCC CTG C ATG AATGG ACCTG CCTG 
CCTGATCAGAATGACGTCATTCAGGCTCGGAAGGCCTATGACCTGCAGAGTGATGCTATTTAC 
AAATCTGATCTTGAGTGGCTGAGAGGCATAGGATGGGTTCCCATTGGCTCTGTAGAGGTCGAG 
AAAGTGAAGAGAGCTGGAGAAATCCTGAGTGACAGGAAGTATCGCCAGCCTGCAGACCAGCTC 
AAATTCACATGCATTACCGACACTCCGGAAATTGTCCTAGCAAAGAATAATGCCCTGACAATG 
AGCAAGCATTTATACACAGAAGCTTGGGATGCTGACAAAACCTCCATCCACGTGATGCCAGAC 
ACCCCAGATATCCTGCTGGCCAAGAGTAATTCTGCCAATATCAGCCAAAAACTTTACACCAAG 
GGATGGG ATG AATCAAAG ATG AAGG ACT ATG AT CTG AG AGC AG ATG C T AT TTCC ATC AAAAG T 
GCCAAGG CCTCC AGGG AC ATC G C C AGTG ACT AC AAAT AC AAGG AAGC CT ATG AG AAAC AG AAA 
GGCCACCACATTGGAGCCCAGAGCATTGAAGATGATCCCAAGATTATGTGTGCCATACATGCA 
GAAAAAATTCAAAGTGAAAGGGAGT AC AAG AAGG AATTCC AAAAG TGG AAAAC C AAG T T CT C T 
AGCCCAGTGGACATGTTAAGCATCTTGCTGGCCAAGAAATGTCAGACTTTGGTCACTGACATT 
TATTATCGCAATTACCTGCATGAATGGACATGCATGCCGGATCAAAACGACATTATCCAAGCA 
AAAAAGGCCTATGACCTGCAGAGTGATGCCCTCTACAAGGCTGACTTGGAGTGGTTGCGTGGC 
ATTGG CTGG ATG CCCC AAGGG T CTCCTGAAG TGTTG AG AG TC AAAAACG CC C AG AAT AT CT T T 
TGTGACAGTGTCTATCGGACGCCTGTGGTGAACCTTAAGTACACAAGCATTGTTGACACACCT 
GAAGTGGTCCTTGCTAAATCAAATGCTGAAAATATTAGTATTCCAAAGTACAGAGAGGTTTGG 
GACAAGGATAAAACTTCAATACACATAATGCCAGATACTCCAGAAATTAATCTCGCTAGAGCA! 
AATGC TC TTAATGTG AGC AAT AAACT TT ACCGTG AG GGCTGGG ATG AAATG AAGG CG 0" S CTGT 
GATGTCCGGCTGGATGCCATCCCCATCCAGGCTGCCAAGGCCTCCAGGGAGATTGCCAGTGAC 
| T AT AAAT AT AAGCTTG AC C ATG AG AAG CAGAAGGG AC AC T ACGTGGG C AC C C T C A C AG C C AGG 
GATGACAACAAGATCCGCTGGGCCCTCATAGCTGACAAGCTCCAGAATGAACGAGAGTACCGG 
CTGGACTGGGCCAAATGGAAGGCCAAGATCCAGAGCCCTGTGGACATGCTTTCCATCCTGCAC 
TCTAAAAATTCCCAGGCTCTGGTCAGTGACATGGATTACCGCAATTACCTGCACCAGTGGACC 
TGCATGCCCGACCAGAACGATGTGATTCAGGCCAAGAAGGCCTACGAACTGCAGAGCGATAAT 
GTTTACAAGGCTGACTTGGAATGGTTGCGTGGAATTGGGTGGATGCCAAATGACTCCGTGTCC 
GTCAATCATGCCAAACATGCCGCGGACATCTTCAGTGAGAAAAAATATCGCACAAAAATAGAA 
ACTCTCAACTTTACGCCTGTGGATGACAGAGTTGATTATGTGACAGCGAAACAAAGTGGCGAG 
ATCCT CG ATG AT ATT AAATAC CGG AAAGACTGG AATG CC ACC AAATC AAAGT A C A CC C T C AC A 
GAAACCCCCCTGCTGCACACTGCCCAGGAGGCTGCTAGGATACTGGACCAGTATCTCTACAAG 
GAAGGCTGGGAGAGACAAAAAGCCACAGGTTACATTTTGCCTCCAGATGCTGTGCCATTTGTT 
CATGCCCATCACTGCAATGACGTTCAGAGTGAGCTGAAATACAAAGCTGAACATGTGAAGCAA 
AAAGGTCATTATGTTGGTGTCCCGACGATGAGAGATGATCCTAAGCTGGTTTGGTTTGAGCAT 
GCAGGCCAGATTCAGAATGAGAGACTATACAAAGAGGACTATCACAAAACAAAGGCCAAAATC 
AATATACCTGCTGATATGGTGTCAGTCTTGGCCGCCAAGCAGGGGCAGACCCTTGTCAGTGAT 
ATTGATTATCGTAATTACTTGCACCAATGGATGTGTCATCCTGACCAGAACGATGTTATTCAG 
GCAAGAAAGGCCTATGACCTACAGAGTGATAATGTCTACAGAGCTGACCTGGAGTGGCTCCGA 
GGCATTGGCTGG AT CCCACTGGATTCTGTGG ACC ATGTAAGGGTT ACT AAG AACC AGG AAATG 
ATGAGTCAGATCAAATATAAGAAAAATGCCCTTGAAAACTATCCTAACTTTACAAGTGTGGTG 
GATCCTCCAGAGATTGTTTTAGCCAAGATTAATTCTGTCAATCAAAGTGATGTAAAATATAAA 
GAAACATTTAATAAAGCAAAGGGCAAATATACGTTTTCACCAGATACACCACATATCTCCCAC 
TCCAAAGACATGGGAAAACTCTACAGTACTATACTGTATAAAGGGGCGTGGGAGGGCACCAAG 
GCCTATGGCTACACCCTGGATGAGCGCTACATTCCCATTGTTGGAGCCAAGCATGCTGATCTG 
GTGAACAGTGAGCTTAAATACAAAGAGACATATGAGAAGCAGAAAGGTCACTACCTGGCTGGA 
AAAGTGATCGGTGAATTCCCTGGTGTGGTTCACTGTCTGGATTTCCAAAAGATGAGGAGTGCG 
TTGAACTACAGAAAACATTATGAGGATACCAAAGCAAATGTTCATATCCCCAATGACATGATG 
AATCACGTGCTGGCTAAAAGGTGCCAGTACATCCTCAGTGACCTGGAGTATCGACACTATTTC 
CACCAGTGGACGTCTCTTCTGGAAGAACCCAATGTTATACGCGTCCGAAACGCCCAGGAGATC 
TTGAGTGATAATGTGTATAAAGATGACCTGAATTGGTTGAAAGGCATTGGTTGCTACGTTTGG 
G AT AC AC CC C AAATC CTC C ATG C C AAG AAAT C AT ACG ACCTTC AG AG TCAG CT AC AAT AT AC A 
GC AG C AGGT AAAG AAAAT CT AC AAAACTAT AATCTGG TC AC AG AC ACGCC C C T CT ATG TG A CT 
GCTGTTCAGAGTGGCATTAATGCCAGTGAGGTAAAATATAAAGAAAATTATCATCAGATTAAG 
GACAAATACACAACAGTTCTAGAAACAGTGGATTATGACAGAACCAGAAACCTGAAGAATCTT 
TACAGCAGTAACCTGTACAAGGAGGCCTGGGATAGAGTGAAAGCCACCAGCTACATCCTGCCT 
TCCAGCACCTTGTCCCTGACACACGCCAAGAACCAGAAGCATCTGGCCAGCCATATCAAATAT 
CGGGAAGAATATGAAAAGTTCAAAGCTCTTTATACGTTACCAAGAAGTGTTGACGATGATCCG 
AAC AC AG C ACGGTGCCTCCG AGTTGG C AAG C TT AAC ATCG AT CG C CTGTAC AG ATC AG TTT AT 
GAAAAGAACAAGATGAAAATCCACATCGTGCCCGACATGGTAGAGATGGTTACTGCCAAGGAT 
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TCCCAGAAGAAAGTCAGTGAGATTGATTACCGCCTGCGCCTCCACGAATGGATTTGCCACCCC 
GACTTGCAAGTCAATGATCACGTCAGGAAAGTCACAGATCAGATCAGCGATATTGTATACAAG 
. GATGACCTCAACTGGCTGAAAGGCATTGGTTGCTACGTCTGGGACACTCCTGAAATCCTCCAT 
i GC C AAGCATG CTT ATG ATCT ACG TG ATG AT ATC AAGTAT AAAG CT C AC ATG TTG AAAAC AAGG 

; AATG AC T AC AAGCTTGTC AC AG ATACAC C AG TC T ACGTG C AGG C TGTCAAAAG TG GG AAAC AG 

j CT AAG TG ACG CTGTCT AC C ACT ATG AC T ATG TGC AC AGTGTC AG AGG CAAAGTGG C TCC AACT 

■ ACCAAAACCGTGGATCTGGACCGGGCCCTTCATGCATACAAGCTCCAGAGTTCGAATCTATAC 
j AAAACCAGCCTGCGCACCCTGCCCACTGGATATAGACTTCCAGGTGACACTCCTCACTTCAAA 
! CACATCAAGGACACCCGTTACATGAGCAGTTATTTCAAGTACAAAGAAGCCTATGAACACACC 
j AAGGCATATGGGTATACACTTGGCCCCAAAGATGTTCCATTTGTCCACGTCCGGAGAGTCAAC 
AATGTTACCAGCGAGAGACTGTATCGGGAATTGTACCACAAACTGAAAGACAAGATCCATACA 
! ACTCCCGATCCCCCTGAGATCCGCCAAGTCAAGAAGACACAAGAGGCTGTCAGTGAGTTGATC 
j " TACAAATCAGACTTCTTCAAGATGCAGGGCCACATGATCTCTCTGCCATACACACCCCAAGTG 

1 ATCCATTGCCGCTATGTGGGAGACATCACCAGTGATATTAAATACAAAGAGGACTTGCAGGTC 
i CTG AAGGG ATTTGGCTG CTTC CTGT ATG AC ACTC CTG AC ATGGTCCG CTCCCG GCAC C TG CGG 

; AAGCTCTGGTCTAATTACCTATACACTGATAAGGCAAGGGAGATGCGAGACAAATACAAAGTG 
i GTGCTTGACACTCCAGAATACAGAAAAGTGCAAGAACTGAAGACACATCTGAGTGAGCTGGTC 
j TACAGAGCTGCAGGCAAGAAGCAGAAGTCAATCTTTACTTCAGTTCCTGATACTCCTGATCTT 
! TTAAGAGCCAAGCGAGGGCAGAAGCTTCAGAGTCAGTATCTGTATGTTGAACTTGCCACCAAA 
! GAGAGACCCCATCATCACGCTGGAAACCAGACCACAGCCTTGAAGCATGCTAAAGACGTGAAG 
' GACATGGTCAGTGAGAAAAAGTACAAGATTCAATATGAAAAGATGAAAGACAAGTACACTCCG 
GTTCCAGATACGCCAATCCTCATCAGAGCCAAGAGGGCTTACTGGAATGCCAGTGATCTACGC i 
TACAAAGAAACATTTCAAAAGACCAAAGGGAAATACCACACGGTGAAAGATGCCCTAGACATT 
j GTCTATCATCGCAAAGTCACAGATGACATCAGTAAAATAAAATACAAGGAGAACTACATGAGC 
CAGTTGGGTATCTGGAGGTCCATTCCTGATCGTCCAGAGCATTTCCACCACCGAGCAGTCACT 
GACACAGTCAGTGATGTAAAATATAAAGAAGACTTGACTTGGCTTAAAGGCATTGGTTGCTAT 
j GCCTATGATACCCCTGATTTCACTCTGGCTGAAAAGAACAAG ACTCTCTACAGCAAGTATAAG 

j TATAAAGAAGTATTTGAAAGGACAAAGTCAGATTTCAAGTATGTTGCCGACTCTCCGATCAAT 
AGGCATTTCAAGTATGCAACTCAATTGATGAATGAGAAAAAATACAGAGCTGATTATGAGCAG 

• CGGAAAGATAAATACCACCTGGTAGTCGATGAGCCTAGACATCTGCTGGCTAAGACCCGCAGC 
j GACCAGATCAGTCAGATCAAATACAGGAAAAACTATGAAAAATCAAAGGACAAATTTACCTCA 

ATTGTGGATACTCCAGAACACCTGCGTACTACAAAAGTCAACAAACAAATCAGCGATATCCTT 
TATAAATTGGAATACAACAAGGCCAAACCCAGAGGCTACACCACAATCCACGACACGCCCATG 
TTGCTGCATGTCCGCAAGGTT AAAG ATG AAGTCAGTGATCTGAAATACAAAG AAG TAT ACCAA 
! AGAAATAAATCCAACTGCACCATTGAGCCAGATGCTGTTCATATCAAAGCAGCCAAGGACGCC 
j T AC AAAGTCAACACC AAT CTGGACTAT AAG AAAC AGT ACG AAGCCAACAAAG C CC ACTG G AAG 

. TGGACTCCTGACCGACCGGACTTCCTCCAGGCTGCCAAGTCATCCCTGCAGCAAAGCGATTTT 
GAATATAAGCTGGACCGGGAGTTCCTCAAGGGTTGCAAGCTTTCTGTCACTGATGACAAAAAC 
! ACGGTGCTCGCCCTCAGGAATACTTTAATAGAAAGTGATCTGAAATACAAAGAGAAACATGTC 
: AAGGAAAGAGGAACCTGTCATGCCGTACCTGACACGCCTCAGATCCTGCTGGCGAAGACTGTC 
AGCAACCTGGTGTCTGAGAACAAGTACAAGGACCATGTCAAGAAGCACTTGGCACAGGGCTCA 
; TACACAACACTACCAGAGACCCGGGACACTGTTCACGTCAAGGAAGTGACCAAGCATGTCAGT 
GATACAAATTACAAAAAGAAGTTTGTCAAGGAGAAAGGAAAATCCAACTACTCCATCATGCTG 
G AGCC AC C AG AGGTG AAA CATGCT ATGG AAGTGGCC AAG AAG C AAAGTG ATGTCG CTT AC AG A 
AAAG ATG CCAAAG AG AAG CTGC ATT AC AC C AC AGTGG CTG AT CG ACC AG AC AT C AAG AAGG C C 
{ ACACAGGCAGCCAAACAGGCCAGTGAGGTGGAGTACAGAGCCAAGCACCGCAAGGAAGGCAGC 
1 CATGGCTTAAGCATGCTCGGTCGCCCAGACATAGAAATGGCCAAGAAGGCAGCCAAGCTGAGC 
AGCCAGGTTAAATACCGAGAAAATTTCGATAAAGAAAAGGGCAAGACACCAAAATACAATCCA 
AAAGACAGCCAGCTCTACAAAGTCATGAAAGATGCTAATAATCTTGCAAGTGAGGTTAAATAC 
; AAGGCTGACCTGAAGAAACTTCACAAACCCGTGACTGACATGAAGGAGTCTCTGATCATGAAT 
CATGTCCTGAATACAAGCCAACTTGCCAGTTCTTACCAGTACAAGAAGAAGTATGAGAAGAGT 
! AAAGGCCACTACCACACCATACCCGATAATCTGGAGCAGCTTCACCTAAAAGAGGCCACAGAA 
TT ACAG AGTATAG TG AAATACAAAGAAAAGT ATG AAAAGG AACG AGGAAAACC C ATG C TGG AC 

• TTTGAAACACCAACGTACATCACTGCCAAAGAGTCTCAGCAGATGCAGAGTGGGAAAGAATAT 
! AGGAAAGATTATGAAGAGTCCATTAAAGGCAGAAACCTGACTGGCCTGGAGGTCACGCCAGCT 
» TTGTT AC ATGT CAAAT ATGCAACT AAAAT AG C AAG CG AG AAAG AG T AC AGG AAAG AT C TAG AG 

GAAAGCATCCGTGGGAAGGGCCTCACTGAAATGGAAGATACACCTGACATGCTAAGAGCAAAG 
! AATGCCACTCAAATCCTCAATGAGAAAGAATATAAGCGAGACCTGGAACTGGAAGTCAAAGGA 
t AGAGGCCTGAATGCCATGGCCAATGAAACTCCGGATTTTATGAGGGCCAGGAATGCTACTGAT 
ATTGCCAGTCAGATTAAGTATAAGCAATCAGCAGAAATGGAGAAAGCCAATTTCACTTCTGTG 
GTTGATACTCCAGAGATCATTCATGCCCAACAAGTCAAGAATCTTTCAAGCCAGAAAAAGTAC 
I AAGG AAGATGCTGAGAAGTCCATGTCGTATT ATG AGACTGTTTTGGACACCCCAGAGATACAG 

j AG AGT CCGGG AGAACC AAAAG AACTT CAG CCTTCT CC AATACC AGTGTG ACC T T AAAAAC AGT 

! AAAGGAAAAATTACAGTTGTTC AAG ACACGCC AG AAAT ACTG CGTGT AAAAG AAAAT CAG AAG 

j AATTTCAGCTCGGTTTTATATAAAGAGGATGTCTCACCAGGAACGGCTATCGGAAAGACACCT 
1 GAGATGATGAGAGTGAAACAAACACAGGACCACATTAGCTCGGTGAAGTATAAGGAAGCAATA 
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GGACAAGGAACTCCAATCCCTGACCTGCCTGAAGTGAAACGTGTGAAGGAGACGCAGAAGCAC 
ATTAGCTCGGTTATGTACAAAGAAAACTTGGGAACAGGCATTCCAACCACTGTGACTCCAGAG 
ATTGAGAGAGTCAAACGCAATCAAGAGAACTTTAGCTCGGTTTTGTACAAAGAAAATGTGGGG 
AAAGCCACCGCAACCCCTGTCACTCCTGAGATGCAGAGAGTCAAACGCAATCAAGAAAACATT 
AGCTCGGTATTGTACAAAGAGAACATGAGAAAAGCAACTCCGACACCTGTTACTCCAGAGATG 
GAG AG AG CT AAGCG C AACCAAG AAAA C ATT AGC T CGGTT C TTT AT TCTG AT AGT TT C CG G AAA 
CAAATACAAGGCAAAGCTGCCTATGTATTGGATACCCCCGAGATGAGACGGGTGAGGGAGACC 
CAACGGCACATCTCAACGGTGAAATATCATGAAGACTTTGAGAAACACAAGGGTTGCTTCACA 
CCAGTGGTGACAGATCCTATCACTGAACGAGTAAAGAAGAACATGCAGGACTTCAGTGACATT 
AACTACCGAGGTATTCAGAGGAAAGTGGTAGAAATGGAACAAAAACGGAATGACCAAGATCAG 
G AGACTATT AC AGGTTT ACGT G T CTGG CGT ACT AATCCTGGTT CGGTTTT TG ACT ATG AT C C A 
G C AG AAG ACAACATC C AG TCC CG AAG CTT AC AC ATG ATT AATGTC CAAG C T C AG CG C CGG AG C 
CGGGAGCAGTCACGATCTGCCAGTGCACTAAGCGTCAGTGGGGGTGAGGAGAAGTCTGAGCAT 
TC AG AAGC AC C AG AC CAC CACCT TTC G ACTT ACAG CG ACGGGGGTGT CTTTG C AG T CTC AAC A 
GCTTACAAACATGCAAAAACCACAGAGCTCCCACAACAACGATCATCTTCAGTTGCTACCCAA 
C AG AC AACGGT AT CTTC C ATCC C AT C TC ATC C ATC TACTG CTGG AAAAAT CTTC CGTGC C ATG 
TATGACTATATGGCTGCTGATGCAGATGAGGTGTCCTTCAAGGATGGAGATGCCATCATAAAT 
GTTCAAGCAATTGATGAAGGCTGGATGTATGGCACTGTGCAGAGGACTGGCAGGACCGGAATG 
CTCCCAGCCAACTACGTTGAAGCTATTTAGGCATTTCAAAGCATCACACTTGTCTGCAGGACT 


TACAGATCCTGCAGTCAATGTTTCGGTTTAGACTCTCCACTGTTACCTAAGTTCTCAAGCTGC 


CTATGGTTTTTCTGTGTCAATGTGATTTATGGTAGTACCATCCTTTCTCCTTTGGGTTTTAAA 


ATAAGTTGCAGAACAGACACTTTAAAAGCTTCTGCAATATTATTTCTGTGCCTAGAGTCTTTC 


TCCATTATAAACATGTTTTAACATTATTTCTTTTCTAAAACAGGGATTTTGAATATGCCAAAC 


ACATTAAAGG AAAAAT AGCAGAGATGTTCACCTTTTCCTTGCTGATTGCTAATGCTTATTATT 


i 
I 


TCTAATTCAGTTCTGAAGTTATAAACTTATAATCAATACAAACCAGCAACTAATAAAACCTCT 


AATTCTGCAAAAAAAAAAAAAAAAAAAAAAGTCG 


i 


ORF Start: ATG at 19 j joRF Stop: TAG at 1 9747 


i 


SEQ ID NO: 144 |6576 aa 


MWat kD 


jNOV35c, 

ICG 11 9566-03 Protein 
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MADDEDYEEWEYYTEEWYEEVPGETITKIYETTTTRTSDYEQSETSKPALAQPALAQPASA 

KPVERRKVIRKKVDPSKFMTPYIAHSQICMQDLFSPNKYKEKFEKTKGQPYASTTDTPELRRIK 

KVQDQLSEVKYRMDGDVAKTICHVDEKAKDIEHAKKVSQQVSKVLYKQNWEDTKDKYLLPPDA 

PELVQAVKNTAMFSKKLYTEDWEADKSLFYPYNDSPELRRVAQAQKALSDVAYKKGLAEQQAQ 

FTPLADPPDIEFAKKVTNQVSKQKYKEDYENKIKGKWSETPCFEVANARMNADNISTRKYQED 

FENMKDQIYFMQTETPEYKMNKKAGVAASKVKYKEDYEKNKGKADYNVLPASENPQLRQLKAA 

GDALSDKLYKENYEKTKAKSINYCETPKFKLDTVLQNFSSDKKYKDSYLKDILGHYVGSFEDP 

YHSHCMKVTAQNSDKNYKAEYEEDRGKGFFPQTITQEYEAIKKLDQCKDHTYKVHPDKTKFTQ 

VTDSPVLLQAQVNSKQLSDLNYKAKHESEKFKCHIPPDTPAFIQHKVNAYNLSDNLYKQDWEK 

SKAKKFDIKVDAIPLLAAKANTKNTSDVMYKKDYEKNKGKMIGVLSINDDPKMLHSLKVAKNQ 

SDRLYKENYEKTKAKSMNYCETPKYQLDTQLKNFSEARYKDLYVKDVLGHYVGSMEDPYHTHC 

MKVAAQNSDKSYKAEYEEDKGKCYFPQTITQEYDAIKKLDQCKDHTYPCVHPDKTKFTAVTDSP 

VLLQAQLNTKQLSDLNYKAKHEGERFKCHIPADAPQFIQHRVNAYNLSDNVYKQDWEKSKAKK 

FDIKVDAIPLLAAKANTKNTSDVMYKKDYEKSKGKMIGALSI^DPKMLHSLKTAKNQSDREY 

RKDYEKSKTIYTAPLDMLQVTQAKKSQAIASDVDYKHILHSYSYPPDSINVDLAKKAYALQSD 

VEYKADYNSWMKGCGWVPFGSLEMEKAKRASDILNEKKYRQHPDTLKFTSIEDAPITVQSKIN 

QAQRSDIAYKAKGEEIIHNYNLPPDLPQFIQAKVNAYNISENMYKADLKDLSKKGYDLRTDAI 

PIRAAKAARQAASDVQYKKDYEKAKGKMVGFQSLQDDPKLVHYMNVAKIQSDREYKKDYEKTK 

SKYNTPHDMFNWAAKKAQDVVSNVNYKHSLHHYTYLPDAMDLELSKNMMQIQSDNVYKEDYN 

NWMKGIGWIPIGSLDVEKVKKAGDALNEKKYRQHPDTLKFTSIVDSPVMVQAKQNTKQVSDIL 

YKAKGEDVKHKYTMSPDLPQFLQAKCNAYSISDVCYKRDWHDLIRKGNNVLGDAIPITAAKAS 

RNIASDYKYKEAYEKSKGKHVGFRSLQDDPKLVHYMNVAKLQSDREYKKNYENTKTSYHTPGD 

MVTITAAKMAQDVAT^TOYKQPLHHYTYLPDAMSLEHTRNVNQIQSDNVYKDEYNSFLKGIGW 

IPIGSLEVEKVKKAGDALNERKYRQHPDTVKFTSVPDSMGMMLAQHNTKQLSDLNYKVEGEKL 

KHKYTIDPELPQFIQAKVNALNMSDAHYKADWKKTIRKGYDLRPDAIPIVAAKSSRNIASDCK 

YKEAYEKAKGKQVGFLSLQDDPKLVHYMNVAKIQSDREYKKGYEASKTKYHTPLDMVSVTAAK 

KSQEVATNANYRQSYHHYTLLPDALNVEHSRNAMQIQSDNLYKSDFTNWMKGIGWVPIESLEV 

EKAKKAGEILSEKKYRQHPEKLKFTYAMDTMEQALNKSNKLNMDKRLYTEKWNKDKTTIHVMP 

DTPDILLSRVNQITMSDKLYKAGWEEEKKKGYDLRPDAIAIKAARASRDIASDYKYKKAYEQA 

KGKHIGFRSLEDDPKLVHFMQVAKMQSDREYKKGYEKSKTSFHTPVDMLSWAAKKSQEVATN 

ANYRNVIHTYNMLPDAMSFELAKNMMQIQSDNQYKADYADFMKGIGWLPLGSLEAEKNKKAME 

I I S E K K Y RQH PDTL K YSTLM DSMNM V L AQNN AK I MNEHL YKQAWE ADKTK VH IMPDIPQIILA 

KANAINISDKLYKLSLEESKKKGYDLRPDAIPIKAAKASRDIASDYKYKYNYEKGKGKMVGFR 

SLEDDPKLVHSMQVAKMQSDREYKKNYENTKTSYHTPADMLSVTAAKDAQANITNTNYKHLIH 

KYILLPDAMNIELTRNMNRIQSDNEYKQDYNEWYKGLGWSPAGSLEVEKAKKATEYASDQKYR 

QHPSNFQFKKLTDSMDMVLAKQNAHTMNKHLYTIDWNKDKTKIHVMPDTPDILQAKQNQTLYS 

QKLYKLGWEEALKKGYDLPVDAISVQLAKASRDIASDYKYKQGYRKQLGHHVGFRSLODDPKL 



230 



WO 03/023002 



•CT/US02/28539 



VLSMNVAKMQSEREYKKDFEKWKTKFSSPVDMLGWLAKKCQELVSDVDYKNYLHQWTCLPDQ 

NDWQAKKVYELQSENLYKSDLEWLRGIGWSPLGSLEAEKNKRASEIISEKKYRQPPDRNKFT 

SIPDAMDIVLAKTNAKNRSDRLYREAWDKDKTQIHIMPDTPDIVLAKAKLINTSDKLYRMGYE 

ELKRKGYDLPVDAIPIKAAKASREIASEYKYKEGFRKQLGHHIGARNIEDDPKMMWSMHVAKI 

QSDREYKKDFEKWKTKFSSPVDMLGWLAYKCQTLVSDVDYKNYLHQWTCLPDQSDVIHARQA 

YDLQSDNLYKSDLQWLKGIGWMTSGSLEDEKNKRATQILSDHVYRQHPDQFKFSSLMDSIPMV 

LAKNNAITMNHRLYTEAWDKDKTTVHIMPDTPEVLLAKQNKVNYSEKLYKLGLEEAKRKGYDM 

RVDAIPIKAAKASRDIASEFKYKEGYRKQLGHHIGARAIRDDPKMMWSMHVAKIQSDREYKKD 

FEKWKTKFSSPVDMLGWLAKKCQTLVSDVDYKNYLHQWTCLPDQSDVIHARQAYDLQSDNMY 

KSDLQWMRGIGWVSIGSLDVEKCKRATEILSDKIYRQPPDRFKFTSVTDSLEQVLAKNNALNM 

NKRLYTEAWDKDKTQIHIMPDTPEIMLARQNKINYSETLYKLANEEAKKKGYDLRSDAIPIVA 

AKASRDVISDYKYKDGYRKQLGHHIGARNIEDDPKMMWSMHVAKIQSDREYKKDFEKWKTKFS 

S P VDMLG WLAKKCQTLVSDVDYKNYLHE WTCLPDQNDVI HARQAYDLQSDN I Y KSDLQ WLRG 

IGWVPIGSMDWKCKRAAEILSDNIYRQPPDKLKFTSVTDSLEQVLAKNNALNMNKRLYTEAW 

DKDKTQVHIMPDTPEIMLARQNKINYSESLYRQAMEEAKKEGYDLRSDAIPIVAAKASRDIAS 

DYKYKEAYRKQLGHHIGARAVHDDPKIMWSLHIAKVQSDREYKKDFEKYKTRYSSPVDMLGIV 

LAKKCQTLVSDVDYKHPLHECICLPDQNDIIHARKAYDLQSDNLYKSDLEWMKGIGWVPIDSL 

EWRAKRAGELLSDTIYRQRPETLKFTSITDTPEQVLAKNNALNMNKRLYTEAWDNDKKTIHV 

MPDTPEIMLAKLNRINYSDKLYKLALEESKKEGYDLRLDAIPIQAAKASRDIASDYKYKEGYR 

KQLGHHIGARNIKDDPKMMWSIHVAKIQSDREYKKEFEKWKTKFSSPVDMLGVVLAKKCQILV 

SDIDYKHPLHEWTCLPDQNDVIQARKAYDLQSDAIYKSDLEWLRGIGWVPIGSVEVEKVKRAG 

EILSDRKYRQPADQLKFTCITDTPEIVLAKNNALTMSKHLYTEAWDADKTSIHVMPDTPDILL 

AKSNSANISQKLYTKGWDESKMKDYDLRADAISIKSAKASRDIASDYKYKEAYEKQKGHHIGA 

QSIEDDPKIMCAIHAEKIQSEREYKKEFQKWKTKFSSPVDMLSILLAKKCQTLVTDIYYRNYL 

HEWTCMPDQNDIIQAKKAYDLQSDALYKADLEWLRGIGWMPQGSPEVLRVKNAQNIFCDSVYR 

TP WNLK YTS I VDTPE WLAKSNAENI S I PKYRE VWDKDKTS IHI M PDTPE I NLARANALNVS 

NKLYREGWDEMKAGCDVRLDAIPIQAAKASREIASDYKYKLDHEKQKGHYVGTLTARDDNKIR 

WALIADKLQNEREYRLDWAKWKAKIQSPVDMLSILHSKNSQALVSDMDYRNYLHQWTCMPDQN 

DVIQAKKAYELQSDNVYKADLEWLRGIGWMPNDSVSVNHAKHAADIFSEKKYRTKIETLNFTP 

VDDRVDYVTAKQSGEILDDIKYRKDWNATKSKYTLTETPLLHTAQEAARILDQYLYKEGWERQ 

KATG Y I L P P DAV P FVHAHHCND VQS E L K Y KAEHV KQKGH YVG VPTMRDDP K L V W FEHAGQ I QN 

ERLYKEDYHKTKAKINIPADMVSVLAAKQGQTLVSDIDYRNYLHQWMCHPDQNDVIQARKAYD 

LQSDNVY RADLEWLRG I GW I P LD S VDHVRVT KNQEMM SQ I K Y KKN ALENY PN FT S WD P PE I V 

LAKINSVNQSDVKYKETFNKAKGKYTFSPDTPHISHSKDMGKLYSTILYKGAWEGTKAYGYTL 

DERYIPIVGAKHADLVNSELKYKETYEKQKGHYLAGKVIGEFPGWHCLDFQKMRSALNYRKH 

YEDTKANVHI PNDMMNHVLAKRCQY I LSDLE YRH Y FHQWTS LLEE PN V I R VRNAQE I LSDMVY 

KDDLNWLKGIGCYVWDTPQILHAKKSYDLQSQLQYTAAGKENLQNYNLVTDTPLYVTAVQSGI 

NASEVKYKENYHQIKDKYTTVLETVDYDRTRNLKNLYSSNLYKEAWDRVKATSYILPSSTLSL 

THAKNQKHLASHIKYREEYEKFKALYTLPRSVDDDPNTARCLRVGKLNIDRLYRSVYEKNKMK 

IHIVPDMVEMVTAKDSQKKVSEIDYRLRLHEWICHPDLQVNDHVRKVTDQISDIVYKDDLNWL 

KGIGCYVWDTPEILHAKHAYDLRDDIKYKAHMLKTRNDYKLVTDTPVYVQAVKSGKQLSDAVY 

HYDYVHSVRGKVAPTTKTVDLDRALHAYKLQSSNLYKTSLRTLPTGYRLPGDTPHFKH I KDTR 

YMSSYFKYKEAYEHTKAYGYTLGPKDVPFVHVRRVNNVTSERLYRELYHKLKDKIHTTPDPPE 

IRQVKKTQEAVSELIYKSDFFKMQGHMISLPYTPQVIHCRYVGDITSDIKYKEDLQVLKGFGC 

FLYDTPDMVRSRHLRKLWSNYLYTDKAREMRDKYKWLDTPEYRKVQELKTHLSELVYRAAGK 

KQKSIFTSVPDTPDLLRAKRGQKLQSQYLYVELATKERPHHHAGNQTTALKHAKDVKDMVSEK 

KYKIQYEKMKDKYTPVPDTPILIRAKRAYWNASDLRYKETFQKTKGKYHTVKDALDIVYHRKV 

TDDISKIKYKENYMSQLGIWRSIPDRPEHFHHRAVTDTVSDVKYKEDLTWLKGIGCYAYDTPD 

FTLAEKNKTLYSKYKYKEVFERTKSDFKYVADSPINRHFKYATQLMNEKKYRADYEQRKDKYH 

LVVDEPRHLLAKTRSDQI SQI KYRKNYEKSKDKFTS I VDTPEHLRTTKVNKQI SDI LYKLE YN 

KAKPRGYTTIHDTPMLLHVRKVKDEVSDLKYKEVYQRNKSNCTI EPDAVHI KAAKDAY KVNTN 

LDYKKQYEANKAHWKWTPDRPDFLQAAKSSLQQSDFEYKLDREFLKGCKLSVTDDKNTVLALR 

NTLIESDLKYKEKHVKERGTCHAVPDTPQILLAKTVSNLVSENKYKDHVKKHLAQGSYTTLPE 

TRDTVHVKEVTKHVSDTNYKKKFVKEKGKSNYSIMLEPPEVKHAMEVAKKQSDVAYRKDAKEK 

LHYTTVADRPDIKKATQAAKQASEVEYRAKHRKEGSHGLSMLGRPDIEMAKKAAKLSSQVKYR 

ENFDKEKGKTPKYNPKDSQLYKVMKDANNLASEVKY KADLKKLHKPVTDMKESLIMNHVLNTS 

QLASSYQYKKKYEKSKGHYHTIPDNLEQLHLKEATELQSIVKYKEKYEKERGKPMLDFETPTY 

ITAKESQQMQSGKEYRKDYEESIKGRNLTGLEVTPALLHVKYATKIASEKEYRKDLEESIRGK 

GLTEMEDTPDMLRAKNATQILNEKEYKRDLELEVKGRGLNAMANETPDFMRARNATDIASQIK 

YKOSAEMEKANFTSWDTPEIIHAQQVKNLSSQKKYKEDAEKSMSYYETVLDTPEIQR VRENQ 

KNFSLLQYQCDLKNSKGKITWQDTPEILRVKENQKNFSSVLYKEDVSPGTAIGKTPEMMRVK 

QTQDHISSVKYKEAIGQGTPIPDLPEVKRVKETQKHISSVMYKENLGTGIPTTVTPEIERVKR 

NQENFSSVLYKENVGKATATPVTPEMQRVKRNQENISSVLYKENMRKATPTPVTPEMERAKRN 

QENISSVLYSDSFRKQIQGKAAYVLDTPEMRRVRETQRHISTVKYHEDFEKHKGCFTPWTDP 

ITERVKKNMQDFSDINYRGIQRKWEMEQKRNDQDQETITGLRVWRTNPGSVFDYDPAEDNIQ 

SRSLHMINVQAQRRSREQSRSASALSVSGGEEKSEHSEAPDHHLSTYSDGGVFAVSTAYKHAK 
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Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 35B. 



Table 35B. Comparison of NOV35a against NOV35b and NOV35c. 


Protein Sequence 


NOV35a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV35b 


24..6700 
24..6669 


6453/6677 (96%) 
6456/6677 (96%) 


NOV35c 


24..6400 
24..6405 


6165/6386 (96%) 
6185/6386 (96%) 



5 

Further analysis of the NOV35a protein yielded the following properties shown in 



Table 35C. 

/ 



Table 35C Protein Sequence Properties NOV35a 


PSort 
analysis: 


0.8800 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 

.„. « „ . , i 



10 

A search of the NOV35a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 35 D. 



15 



Table 35D. Geneseq Results for NOV35a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV35a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


ABG01254 


Novel human diagnostic protein #1245 
- Homo sapiens, 1573 aa. 
[WO200175067-A2, U-OCT-2001] 


740.. 1871 
405..1518 


482/1161(41%) 
756/1161 (64%) 


0.0 


ABB 12450 








e-136 
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protein SEQ ID NO: 289 - Homo 
sapiens. 1 654 aa. [ WO200 1 74836-A 1 , 
1I-OCT-200I] 


I..432 


336/500 (67%) ! 

; 

! 

| 


AAM40050 


Human polypeptide SEQ ID NO 3 195 

- Homo sapiens, 370 aa. 

[ WO200 1 533 1 2-A 1 , 26-JUL-200 1 ] 


3333..3660 
1..328 


148/331 (44%) 
226/331 (67%) 


4e-79 


AAM28003 


Peptide #2040 encoded by probe for 
measuring placental gene expression - 
Homo sapiens, 102 aa. 
[WO200157272-A2, 09-AUG-2001] 


4008..4109 
1..102 


102/102(100%) 
102/102(100%) 


5e-55 


AAM15512 


Peptide # 1 946 encoded by probe for 
measuring cervical gene expression - 
Homo sapiens, 102 aa. 
[WO200I57278-A2, 09-AUG-2001] 


4008..4109 
1..102 


102/102(100%) 
102/102(100%) 


5e-55 



In a BLAST search of public sequence datbases, the NOV35a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 35E. 



Table 35E. Public BLASTP Results for NOV35a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV35a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


P20929 


Nebulin - Homo sapiens 
(Human), 6669 aa. 


1 ..6700 
1..6669 


6603/6705 (98%) 
6619/6705 (98%) 


0.0 


Q14215 


NEBULIN - Homo sapiens 
(Human). 3007 aa (fragment). 


3573..6561 
4..2980 


2668/3012(88%) 
2770/3012(91%) 


o.o i 

i 


Q14214 


NEBULIN - Homo sapiens 
(Human), 2472 aa (fragment). 


280..2750 
1..2471 


2460/2471 (99%) 
2466/2471 (99%) 


0.0 


Q9DEH4 


Nebulin - Gallus gallus 
(Chicken), 2402 aa (fragment). 


4011. .6381 
1..2339 


1572/2389(65%) 
1878/2389 (77%) 


0.0 


Q62411 


Nebulin - Mus musculus 
(Mouse). 1 358 aa (fragment). 


2116..3473 
1 ..1358 


1238/1358 (91%) 
1311/1358 (96%) 


0.0 



5 

PFam analysis predicts that the NOV35a protein contains the domains shown in the 
Table 35 F. 



Table 35F. Domain Analysis of NOV35a 


Pfam Domain 


NOV35a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 
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Nebulirwepeat 


82..1I0 


14/29(48%) 
24/29 (83%) 


0.0011 


•Nebulin repeat 

1 
t 


II8..146 


12/29(41%) 
25/29 (86%) 


l.le-07 


Nebulin_repeat 


153.. 181 


10/29 (34%) 
25/29 (86%) 


8e-05 


Nebulinj-epeat 


188.216 


13/29(45%) 
24/29 (83%) 


5e-08 


Nebulin_repeat 


223..25I 


9/29 (31%) 
24/29 (83%) 


0.0022 


Nebulinj-epeat 


258..286 


14/29 (48%) 
25/29 (86%) 


4.9e-07 


Nebulinjepeat 


293.32 1 


11/29 (38%) 
21/29 (72%) 


0.00095 


Nebulin_repeat 


329..3S7 


17/29 (59%) 
24/29 (83%) 


8.3e-08 


Nebulinj-epeat 


368.396 


11/29(38%) 
26/29 (90%) 


2.7e-07 v) 


Nebulinjepeat 


439..467 


9/29 (31%) 
23/29 (79%) 


0.013 


Nebulin_repeat 


507..535 


12/29(41%) 
25/29 (86%) 


3e-06 


Nebulinjepeat 


542..570 


14/29(48%) 
26/29 (90%) 


2.1e-10 


Nebulin_repeat 


578..606 


16/29 (55%) 
26/29 (90%) 


4.7e-09 


Nebulinjepeat 

i 


6I6..644 


15/29 (52%) 
28/29 (97%) 


3.6e-10 


i ■ 1 

Nebulinjepeat 


686..714 


11/29 (38%) 
22/29 (76%) 


0.0072 


Nebulinj-epeat 


754.Z782 


11/29 (38%) 
25/29 (86%) 


8.2e-05 


Nebulin_repeat 

1 


789..8I7 


12/29(41%) 
26/29 (90%) 


4e-09 


r 

Nebulinjepeat 


825..853 


16/29 (55%) 
26/29 (90%) 


4e-09 


Nebulinjepeat 


863..891 


14/29 (48%) 
27/29 (93%) 


2.5e-09 


Nebulinjepeat 


929..957 


14/29(48%) 
26/29 (90%) 


7.4e-07 
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f- - ■ - 
jNebuliiwepeat 


965..993 


6/29 (21%) 
21/29 (72%) 


0.0043 


Nebulin_repeat 


998.. 1026 


10/29 (34%) 
21/29(72%) 


0.00033 


Nebulin_repeat 


1033..1061 


10/29 (34%) 
22/29 (76%) 


8.7e-05 


Nebulin_repeat 


1069.. 1097 


. 15/29(52%) 
27/29 (93%) 


1.4e-08 


Nebuiin_repeat 


1107..I135 


15/29 (52%) 
26/29 (90%) 


1.2e-08 


Nebulinjepeat 


1173.1201 


11/29 (38%) 
27/29 (93%) 


8.9e-08 


Nebulinj-epeat 


1209.. 1237 • 


5/29(17%) 
! 21/29 (72%) 


0.099 


Nebulirwepeat 


1242.. 1270 


13/29(45%) 
25/29 (86%) 


3.9e-09 


Nebulin_repeat 


1277.. 1305 


12/29 (41%) 
20/29 (69%) 


0.00026 


Nebulin_repeat 


1313.1341 


15/29 (52%) 
27/29 (93%) 


l.2e-08 


Nebulin_repeat 


1351. .1379 


14/29 (48%) 
27/29 (93%) 


6.7e-08 


jNebulin_repeat 


1417.. 1445 


7/29 (24%) 
21/29(72%) 


0.0012 


Nebulin_repeat 


1453.. 1481 


6/29 (21%) 
21/29 (72%) 


0.13 


Nebulin_repeat 


I486.. 1514 


12/29(41%) 
25/29 (86%) 


2.1e-06 


Nebulin_repeat 


1521..1549 


11/29 (38%) 
23/29 (79%) 


2.2e-06 


Ncbulinj*epeat 


1557.. 1585 


16/29 (55%) 
26/29 (90%) 


3.1e-08 


Nebulin_repeat 


1595.. 1623 


13/29 (45%) 
26/29 (90%) 


7.5e-08 


Nebulinjrepeat 


1661..1689 


7/29 (24%) 
26/29 (90%) 


2.6e-06 


Nebulin_repeat 


1 697.. 1725 


10/29 (34%) 
24/29 (83%) 


1.5e-06 


Nebulinjepeat 


1730..1758 


10/29 (34%) 
23/29 (79%) 


0.00049 
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Nebulirwepeat 


1765.. 1793 


• 13/29(45%) 
! 26/29 (90%) 


l.le-09 


Nebulirwepeat 


1801..1829 


]' 12/29(41%) 
! 25/29 (86%) 


1.3e-06 


Nebulin_repeat 


1839.. 1867 


| 14/29 (48%) 
| 27/29 (93%) 


4.2e-09 


Nebulin_repeat 


1905..1933 


j 1 1/29 (38%) 
[ 24/29 (83%) 


l.le-05 


Nebulin_repeat 


I941..1969 


7/29 (24%) 
19/29(66%) 


0.74 


Nebulinj-epeat 


1974..2002 


1 1/29 (38%) 
27/29 (93%) 


2.7e-08 


Nebulin_repeat 


2009..2037 


14/29(48%) 
1 25/29 (86%) 


3e-08 


Nebulinrepeat 


2045..2073 j 14/29(48%) 

: 25/29 (86%) 


le-06 


Nebiilin_repeat 


2083..2111 


13/29(45%) 
28/29 (97%) 


l.8e-08 


Nebuiin_repeat 


2149..2177 


10/29 (34%) 
25/29 (86%) 


2.1e-05 


Nebulin_repeat 


2185..2213 


8/29 (28%) 
19/29 (66%) 


0.7 


Nebulin_repeat 


22I8..2246 


11/29 (38%) 
25/29 (86%) 


1.4e-06 


Nebulin_repeat 


2253..2281 


12/29(41%) 
26/29 (90%) 


4.3e-10 


Nebulinjepeat 


2289..2317 


12/29(41%) 
26/29 (90%) 


5.2e-07 


Nebulin_repeat 


2327..235S 


14/29(48%) 
28/29 (97%) 


!.3e-09 


Nebulin_repeat 


2362.2389 


12/29 (41%) 
22/29 (76%) 


0.49 


Nebulin_repeat 


2393..2421 


13/29(45%) 
24/29 (83%) 


7.6e-07 


Nebulinjepeat 


2428..2456 


6/29 (21%) 
21/29 (72%) 


0.52 


Nebulin_jepeat 


2461. .2489 


15/29 (52%) 
27/29 (93%) 


5.6e-10 


Nebulin_repeat 


2496..2524 


15/29 (52%) 
23/29 (79%) 


4.4e-08 
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Nebulin_repeat 


2532..2560 


11/29 (38%) 
25/29 (86%) 


l.4e-06 


Nebulin_repeat 


2570..2598 


j 13/29 (45%) 
! 28/29 (97%) 


2.5e-10 


Nebulin_repeat 


2636.-2664 


! 11/29 (38%) 
23/29 (79%) 


4e-06 


Nebulin_repeat 


2704..2732 


11/29 (38%) 
1 27/29 (93%) 


2.2e-07 


Nebulinrepeat 


2739.2767 


| 13/29(45%) 
| 26/29 (90%) 


8.2e-09 


Nebulin_repeat 


2775..2803 


12/29(41%) 
24/29 (83%) 


5e-06 


Nebulin_repeat 


2813..2841 


13/29(45%) 
28/29 (97%) 


2.5e-l0 , 


Nebulin_repeat 


2848.-2875 


12/29 (41%) 
21/29 (72%) 


0.9 


Nebulin_repeat 


2879..2907 


10/29 (34%) 
23/29 (79%) 


6.2e-05 


Nebulin_repeat 


29I4..2942 


5/29(17%) 
20/29 (69%) 


0.45 


Nebulin_repeat 


2947..2975 


12/29 (41%) 
27/29 (93%) 


2.1e-07 


Nebulin_repeat 


2982..3010 


13/29(45%) 
25/29(86%) 


3.9e-08 


Nebulin_repeat 


3018..3046 


13/29 (45%) 
24/29 (83%) 


2.3e-06 


Nebulin_repeat 


3056..3084 


13/29 (45%) 
28/29 (97%) 


2.5e-10 


Nebulin_repeat 


3091.31 19 


12/29 (41%) 
21/29 (72%) 


0.83 


Nebulin_repeat 


3122..3150 


10/29 (34%) 
22/29 (76%) 


0.00018 


Nebulin_repeat 


3157..3I85 


8/29 (28%) 
20/29 (69%) 


0.077 


i- 

Nebulin_repeat 


3190..3218 


12/29(41%) 
27/29 (93%) 


2.le-07 


Nebulin_repeat 


3225..32S3 


12/29 (41%) 
26/29 (90%) 


6.5e-08 


Nebulinrepeat 


3261. .3289 J 


14/29 (48%) 
26/29 (90%) 


6.7e-08 
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j Nebulin__repeat 


3299.3327 


14/29 (48%) 
25/29 (86%) 


2.5e-06 


jNebulin_repeat 


3365.3393 


14/29 (48%) 
25/29 (86%) 


1.8e-08 


Nebulin_repeat 


3400..3428 


11/29 (38%) 
21/29(72%) 


5.9e-05 


Nebulin_repeat 


3433..346I 


13/29 (45%) 
27/29 (93%) 


l.le-08 


"Nebulin_repeat 


3468.3496 


15/29 (52%) 
26/29 (90%) 


5.5e-09 


Nebulinrepeat 


3504..3532 


13/29 (45%) 
26/29 (90%) 


1.8e-07 


Nebulin_repeat 


3542..3570 


12/29 (41%) 
28/29 (97%) 


3.9e-09 


| Nebulin_repeat 

i 


3608..3636 


| 12/29 (41%) 
22/29 (76%) 


0.00011 


Nebulinj-epeat 


3643-367 1 


7/29 (24%) 
20/29 (69%) 


0.07 


Nebu I in repeat 

1 


3676-3704 


15/29 (52%) 
25/29 (86%) 


le-08 


1 Nebu 1 in repeat 

i 


3711-3739 


11/29 (38%) 
27/29 (93%) 


1.5e-08 


|NebuIin_repeat 

i 


3747-3775 


15/29 (52%) 
26/29 (90%) 


1.6e-07 


Nebulin_repeat 


3785-3813 


11/29(38%) 
25/29 (86%) 


2.7e-07 


Nebulin_repeat 


3851-3879 


14/29 (48%) 
24/29 (83%) 


4.8e-07 


Nebulinj-epeat 


3919-3947 


16/29 (55%) 
25/29 (86%) 


3.6e-09 


Nebulin_repeat 

i 


3954.3982 


13/29 (45%) 
26/29 (90%) 


5.3e-09 


Nebulin_repeat 


3989-4017 


14/29 (48%) 
27/29 (93%) 


l.le-08 


Nebulin_repeat 


4027-4055 


10/29 (34%) 
23/29 (79%) 


0.00011 


Nebulin_repeat 


4093-4121 


13/29 (45%) 
24/29 (83%) 


6.8e-07 


Nebulinrepeat 


4128..4156 


8/29 (28%) 
19/29 (66%) 


0.15 
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\ Nebulin_repeat 

i 
I 


4161..4I89 


! 9/29 (31%) 
| 24/29 (83%) 


5.2e-07 


f 

i Nebulinjepeat 


4I95..4223 j 9/29 (31%) 

j 25/29 (86%) 


6.1e-06 


| Nebulinj-epeat 


4231. .4259 


! 11/29(38%) 
! 24/29 (83%) 


2.6e-06 


i 

;Nebulin_repeat 

i 


4269.-4297 


11/29 (38%) 
27/29 (93%) 


5.6e-08 


i 

| Nebulmj*epeat 


4335..4363 


11/29 (38%) 
24/29 (83%) 


5.9e-06 


i 

Nebulinjepeat 


4370..4398 


8/29 (28%) 
23/29 (79%) 


0.0069 


Nebulin_repeat 


4405..4433 


120/29 (69%) 
25/29 (86%) 


1.2e-IO 


j Nebulinj*epeat 

! 


4440..4468 


12/29 (41%) 
23/29 (79%) 


8.3e-06 


jNebulin_repeat 

i 


4476..4504 


14/29 (48%) 
21/29 (72%) 


0.00042 


j 

j Nebulinjepeat 


4549..4577 


8/29 (28%) 
21/29 (72%) 


0.14 


jNebuIin_repeat 

} 

i 


4580..4608 


8/29 (28%) 
24/29 (83%) 


2e-05 


jNebuliiwepeat 


4615..4643 


11/29 (38%) 
26/29 (90%) 


l.8e-06 


Nebulinj-epeat 

< 


4650..4678 


13/29 (45%) 
23/29 (79%) 


2.1e-06 


jNebulinj-epeat 

! 


4685..4713 


7/29 (24%) 
23/29 (79%) 


0.00067 


i 

|Nebulin_repeat 

1 


4721. .4749 


12/29 (41%) 
23/29 (79%) 


2.8e-05 


i .... 

Nebulin_repeat 


4759..4787 


10/29 (34%) 
20/29 (69%) 


0.015 


Nebulin_repeat 


4794..4822 

I 


8/29 (28%) 
21/29 (72%) 


0.14 


i ■- - — 

Nebulin_repeat 


4825..48S3 


9/29 (31%) 
20/29 (69%) 


0.037 


Nebulin_repeat 


4860..4888 


13/29 (45%) 
23/29 (79%) 


2.3e-06 


Nebulin_repeat 


4895..4923 


12/29(41%) 
21/29 (72%) 


0.00025 
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1 

Nebulinj-epeat 


4966.-4994 i 12/29(41%) 

j 20/29 (69%) 


0.0011 


, Nebuliiwepeat 


50O2..5O3O 


! 7/29 (24%) 
j 20/29 (69%) 


0.37 


i 

. Nebulin repeat 

i 


5037..5065 


12/29(41%) 
. 22/29 (76%) 


3.3e-05 


| Nebulin repeat 

j 


5072..5I00 


1 1/29 (38%) 
22/29 (76%) 


0.0029 


i ■ - 
• Nebulin_repeat 

1 


5107..5I35 


8/29 (28%) 
22/29 (76%) 


0.024 


Nebulin_repeat 


5I42..5170 


8/29 (28%) 
23/29 (79%) 


0.00063 


Nebulin_repeat 


5177..5205 


13/29 (45%) 
22/29 (76%) 


0.2 


i " ■ - 

jNebulinj-epeat 


5212..5240 


11/29(38%) 
21/29 (72%) 


0.028 


Nebulin_repeat 


5247..5275 


13/29 (45%) 
23/29(79%) 


3.1e-08 * 


Nebulin_repeat 


5282..5310 


11/29 (38%) 
24/29 (83%) 


0.21 


Nebulin repeat 

i 

i 


53I7..5345 


13/29(45%) 
23/29 (79%) 


3e-06 


|Nebulin_repeat 


5352..5380 


15/29 (52%) 
24/29 (83%) 


2.1e-07 


' Nebulin_repeat 

1 


5387..5415 j 


10/29 (34%) 
22/29 (76%) 


2.1e-05 


• Nebulinjepeat 


5422..5450 


12/29(41%) 
24/29 (83%) 


7.9e-06 


Nebulin_repeat 


5457..5485 


13/29 (45%) 
24/29 (83%) 


1.9e-07 


; Nebulin repeat 

i 


5493..552 1 


12/29 (41%) 
24/29 (83%) 


2.1e-06 


jNebulirwepeat 

i 1 


5563..5591 


13/29 (45%) 
21/29 (72%) 


0.00026 


Nebulin repeat 
— — 


5598.-5626 


14/29(48%) 
22/29 (76%) 


0.0018 


Nebulin_repeat 


5633-566 1 


13/29(45%) 
23/29 (79%) 


1.3e-05 


Nebulin_repeat 


5670-5698 


12/29 (41%) 
25/29 (86%) 


4.7e-06 
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Nebulin_repeat 


5707..5735 


11/29 (38%) 
21/29(72%) 


0.0097 


i 

Nebulin_repeat 


5741. .5769 


111/29(38%) 
22/29 (76%) 


0.00015 


r 

'Nebulin_repeat 


5776..5804 


17/29(59%) 
25/29 (86%) 


2.2e-08 


jNebulin_repeat 

1 


5848..5876 


9/29 (31%) 
24/29 (83%) 


0.00049 


iNebulin repeat 

t 
i 


5883..5911 


16/29 (55%) 
25/29 (86%) 


4e-07 


jNebulin repeat 

1 


591 8.-5946 


10/29 (34%) 
27/29(93%) 


l.le-06 


! Nebulin_repeat 

i 


5955.-5983 • 


8/29 (28%) 
25/29 (86%) 


l.le-05 


j Nebulin_repeat 


5992..6020 


10/29 (34%) 
26/29 (90%) 


1.5e-07 


Nebulin_repeat 


6030-6058 


10/29 (34%) 
26/29 (90%) 


le-07 


Nebulin_repeat 


6065..6093 


15/29(52%) 
26/29 (90%) 


l.3e-08 


Nebulin_repeat 


6100..6128 


12/29 (41%) 
26/29(90%) 


5e-06 


Nebulinj-epeat 


6135-6163 


14/29(48%) 
22/29(76%) 


5.7e-06 


;Nebulin_repeat 


6 1 66.-6 194 


10/29 (34%) 
20/29 (69%) 


0.066 


Nebulin_repeat 


6 197. .6225 


10/29 (34%) 
20/29 (69%) 


0.048 


Nebulinjepeat 

i 


6228-6256 


12/29 (41%) 
22/29 (76%) 


4e-05 


iNebulin_/epeat 

1 


6259-6287 


11/29 (38%) 
22/29 (76%) 


l.le-05 


|Nebulin_repeat 

i 


6290..63I8 


11/29 (38%) 
22/29(76%) 


l.le-05 


( Nebulin_repeat 


6321-6349 


11/29(38%) 
24/29 (83%) 


5.5e-07 


Nebulin_repeat 


6352..6380 


11/29 (38%) 
22/29 (76%) 


9.6e-06 


Nebulin repeat 

! 


6383-641 1 


11/29 (38%) 
22/29 (76%) 


6.7e-05 
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Nebulin_repeat 


64I4..6442 


10/29 (34%) 


2e-05 






21/29 (72%) 




Nebulin_repeal 


6450..6478 


14/29(48%) 


3e-09 






25/29 (86%) 




SH3 


6644..6700 


24/58(41%) 


2.6e-l7 






47/58(81%) 





Example 36. 

The NOV36 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 36A. 

5 



Table 36A. NOV36 Sequence Analysis 




SEQ ID NO: 145 ]2973 bp J 


NOV36a, 

CG 120 166-01 DNA 
Sequence 

i 


ATGTCGGAAGAAACCCGACAGAGCAAATTGGCCGCAGCGAAGAAAAAGTTGAGAGAATATCAG 
CAGAGGAATAGCCCTGGTGTTCCTACAGGAGCGAAAAAGAAGAAGAAAATAAAAAATGGCAGT 
AACCCTGAGACAACCACTTCTGGTGGTTGCCACTCACCTGAGGATACACCCAAGGACAATGCT 
GCTACTCTACAACCATCTGATGACACCGTGTTACCTGGCGGTGTCCCTTCCCCTGGTGCCAGT 
CTCACTAGCATGGCGGCATCTCAGAATCATGATGCTGACAATGTCCCTAATCTCATGGATGAA 
ACCAAGACTTTCTCATCAACCGAGAGCCTGCGACAACTCTCCCAACAGCTCAATGGTCTTGTT 
TG TG AGT CTG CGACATGTGTC AATGGGG AGGGCC CTG C ATCGT CTGC T AAC C T G AAGG AT C TG 
GAGAGCCGGTACCAACAGCTAGCGGTAGCCCTGGACTCCAGCTATGTAACAAACAAACAACTC 
AAT ATCACG ATAG AG AAATTG AAACAACAG AACC AAG AAATT ACGG ATCAGT TGG AAG AAG AA 
AAGAAAGAATGCCACCAAAAGCAGGGAGCCCTAAGGGAGCAGTTACAGGTTCACATTCAGACC 
ATAGGGATCCTCGTATCAGAGAAAGCTGAGTTACAGACAGCCCTGGCTCACACTCAGCATGCT 
GC C AGGC AG AAAG AAGGAG AGTC TG AAG ATCTGGCC AGC CGCCTG C AGT ATT C CCGG CGG CGT 
GTGGGAGAGTTGGAGCGGGCTCTCTCTGCTGTCTCCACGCAGCAGAAGAAGGCAGACAGGTAC 
AACAAGGAGTTAACCAAAGAGAGAGACGCCCTCAGGCTGGAGTTATACAAGAACACCCAAAGC 
AATGAGGACCTGAAGCAAGAGAAATCAGAATTGGAAGAGAAGCTTCGGGTCCTAGTGACTGAG 
AAGGCTGG C ATGC AG CTT AACTTGG AAG AATTGC AAAAG AAGTT AG AG ATG AC GG AACT C C TG 
CTTCAACAGTTTTCAAGCCGGTGTGAAGCCCCTGATGCTAACCAGCAGTTACAGCAGGCCATG 
GAGGAGCGGGCACAGCTGGAAGCACACCTGGGGCAGGTAATGGAGTCGGTTAGACAACTACAA 
ATGG AG AG AG ATAAAT AT GCGG AG AATCT CAAAGG AG AG AG CGCC ATGTGG C G GC AG AGG ATG 
CAGCAGATGTCAGAGCAGGTGCACACATTGAGAGAGGAGAAGGAATGTAGCATGAGTCGGGTA 
C AGG AG CTGG AG ACG AG C TTGGCTG AACTGAGG AACC AG ATGGCT G AACCCC CGC CCCC AG AG 
CCCCCAGCAGGGCCCTCCGAGGTGGAGCAGCAGCTACAAGCGGAGGCTGAGCACCTGCGGAAG 
G AGCTGG AGG GTC TGGCAGG AC AG CTTC AAG C C C AGG TG C AAG AC AATG AGGG CT TG AGT CG C 
CTGAACCGGGAGCAGGAGGAGAGGCTGCTGGAGCTGGAGCGGGCGGCCGAGCTCTGGGGGGAG 
CAGGCGGAGGCGCGCAGGCAAATCCTGGAGACCATGCAGAACGACCGCACTACCATCAGCCGC 
G C ACTCTCCC AG AACCGG GAG CTC AAGG AGC AG CTGG C TG AG CTG CAG AGCGG ATTTG T AAAG 
CTGACTAATGAGAACATGGAGATCACCAGCGCACTGCAGTCGGAGCAGCACGTCAAGAGGGAG 
CTGGGAAAGAAGCTGGGCGAGCTGCAGGAGAAGCTGAGCGAGCTGAAGGAAACGGTGGAGCTG 
AAGAGCCAAGAGGCTCAAAGTCTGCAGCAGCAGCGAGACCAGTACCTGGGACACCTGCAGCAG 
TATGTGGCCGCCTATCAGCAGCTGACCTCTGAGAAGGAGGTGCTGCATAATCAGCTACTGCTG 
CAGACCCAGCTCGTGGACCAGCTGCAGCAGCAGGAAGCTCAGGGCAAAGCGGTGGCCGAGATG 
GCCCGCCAAGAGTTGCAGGAAACCCAGGAGCGCCTGGAAGCTGCCACCCAGCAGAATCAGCAG 
CTACGGGCCCAGTTGAGCCTCATGGCTCACCCTGGGGAAGGAGATGGACTGGACCGGGAGGAG 
GAGGAGGATGAGGAGGAGGAGGAGGAGGAGGCGGTGGCAGTACCTCAGCCCATGCCAAGCATC 
CCGGAGGACCTGGAGAGCCGGGAAGCCATGGTGGCATTTTTCAACTCAGCTGTAGCCAGTGCC 
G AGG AGG AGC AGGCAAGGCTACGTGGGCAGCTG AAGG AGCAAAGGGTGCGCTGCCGGCGCCTG 
GCTCACCTGCTGGCCTCGGCCCAGAAGGAGCCTGAGGCAGCAGCCCCAGCCCCAGGGACCGGG 
GGTGATTCTGTGTGTGGGGAGACCCACCGGGCCCTGCAGGGGGCCATGGAGAAGCTGCAGAGC 
CGCTTTATGGAGCTCATGCAGGAGAAGGCAGACCTGAAGGAGAGGGTAGAGGAACTGGAACAT 
CGCTGCATCCAGCTTTCTGGAGAGACAGACACCATTGGAGAGTACATTGCACTGTACCAGAGC 
CAGAGGGCAGTGCTGAAGGAGCGGCACCGGGAGAAGGAGGAGTACATCAGCAGGCTGGCCCAA 
GACAAGGAGGAGATGAAGGTGAAGCTGCTGGAGCTGCAGGAGCTGGTCTTACGGCTTGTGGGC 
GACCGCAACGAGTGGCATGGCAGATTCCTGGCAGCTGCCCAGAACCCTGCTGATGAGCCCACT 
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TCAGGGGCCCCAGCCCCCCAGGAACTTGGGGCTGCCAACCAGCAGGGTGATCTTTGCGAGGTG 
AGCCTCGCCGGCAGTGTGGAGCCTGCCCAAGGAGAGGCCAGGGAGGGTTCTCCCCGTGACAAC 
CCCACTGCACAGCAGATCATGCAGCTGCTTCGTGAGATGCAGAACCCCCGGGAGCGCCCAGGC 
TTGGGCAGCAACCCCTGCATTCCTTTTTTTTACCGGGCTGACGAGAATGATGAGGTGAAGATC 
ACTGTCATCTAA 



ORF Start: ATG at 1 



SEQIDNO: 146 



;ORF Stop: TAA at 297 1 



]990 aa 



MWat IH657.5kD 



|NOV36a, 

jCG 1 20 1 66-0 1 Protein 
'Sequence 

I 



r 



MSEETRQSKLAAAKKKLREYQQRNSPGVPTGAKKKKKIKNGSNPETTTSGGCHSPEDTPKDNA 
ATLQPSDDTVLPGGVPSPGASLTSMAASQNHDADNVPNLMDETKTFSSTESLRQLSQQLNGLV 
CESATCVNGEGPASSANLKDLESRYQQLAVALDSSYVTNKQLNITIEKLKQQNQEITDQLEEE 
KKECHQKQGALREQLQVHIQTIGILVSEKAELQTALAHTQHAARQKEGESEDLASRLQYSRRR 
VGELERALSAVSTQQKKADRYNKELTKERDALRLELYKNTQSNEDLKQEKSELEEKLRVLVTE 
KAGMQLNLEELQKKLEMTELLLQQFSSRCEAPDANQQLQQAMEERAQLEAHLGQVMESVRQLQ 
MERDKYAENLKGESAMWRQRMQQMSEQVHTLREEKECSMSRVQELETSLAELRNQMAEPPPPE 
PPAGPSEVEQQLQAEAEHLRKELEGLAGQLQAQVQDNEGLSRLNREQEERLLELERAAELWGE 
QAEARRQILETMQNDRTTISRALSQNRELKEQLAELQSGFVKLTNENMEITSALQSEQHVKRE 
LGKKLGELQEKLSELKETVELKSQEAQSLQQQRDQYLGHLQQYVAAYQQLTSEKEVLHNQLLL 
QTQLVDQLQQQEAQGKAVAEMARQELQETQERLEAATQQNQQLRAQLSLMAHPGEGDGLDREE 
EEDEEEEEEEAVAVPQPMPSIPEDLESREAMVAFFNSAVASAEEEQARLRGQLKEQRVRCRRL 
AHLLASAQKEPEAAAPAPGTGGDSVCGETHRALQGAMEKLQSRFMELMQEKADLKERVEELEH 
RCIQLSGETDTIGEYI ALYQSQRAVLKERHREKEEY I SRLAQDKEEMKVKLLELQELVLRL VG 
DRNEWHGRFLAAAQNPADEPTSGAPAPQELGAANQQGDLCEVSLAGSVEPAQGEAREGSPRDN 
PTAQQIMQLLREMQNPRERPGLGSNPCIPFFYRADENDEVKITVI 



SEQIDNO: 147 



2886 bp 



|NOV36b, 

CGI 20 166-02 DNA 

Sequence 



CCTCC CCG CCCCGCG ATGTCGG AAG AAACC CG AC AG AG C AAATTGGC CGC AG C G AAG AAAAAG 



TTGAGAGAATATCAGCAGAGGAATAGCCCTGGTGTTCCTACAGGAGCGAAAAAGAAGAAGAAA 

AT AAAAAATGG CAGT AACCCTG AGAC AACC ACTTC TGGTGGT TGC C ACTC AC C TG AGG AT AC A 

CCCAAGGACAATGCTGCTACTCTACAACCATCTGATGACACCGTGTTACCTGGCGGTGTCCCT 

TCCCCTGGTGCCAGTCTCACTAGCATGGCGGCATCTCAGAATCATGATGCTGACAATGTCCCT 

AATCTCATGG ATG AAAC C AAG AC TTT CTC ATC AACCG AG AG C CTG CG ACAACT CT CCC AAC AG 

CTCAATGGTCTTGTTTGTGAGTCTGCGACATGTGTCAATGGGGAGGGCCCTGCATCGTCTGCT 

AACCTGAAGGATCTGGAGAGCCGGTACCAACAGCTAGCGGTAGCCCTGGACTCCAGCTATGTA 

ACAAACAAACAACTCAATATCACGATAGAGAAATTGAAACAACAGAACCAAGAAATTACGGAT 

CAGTTGGAAGAAGAAAAGAAAGAATGCCACCAAAAGCAGGGAGCCCTAAGGGAGCAGTTACAG 

GTTCACATTCAGACCATAGGGATCCTCGTATCAGAGAAAGCTGAGTTACAGACAGCCCTGGCT 

CACACTCAGCATGCTGCCAGGCAGAAAGAAGGAGAGTCTGAAGATCTGGCCAGCCGCCTGCAG 

TATTCCCGGCGGCGTGTGGGAGAGTTGGAGCGGGCTCTCTCTGCTGTCTCCACGCAGCAGAAG 

AAGGCAGACAGGTACAACAAGGAGTTAACCAAAGAGAGAGACGCCCTCAGGCTGGAGTTATAC 

AAGAACACCCAAAGCAATGAGGACCTGAAGCAAGAGAAATCAGAATTGGAAGAGAAGCTTCGG 

GTCCTAGTGACTGAGAAGGCTGGCATGCAGCTTAACTTGGAAGAATTGCAAAAGAAGTTAGAG 

ATGACGGAACTCCTGCTTCAACAGTTTTCAAGCCGGTGTGAAGCCCCTGATGCTAACCAGCAG 

TTACAGCAGGCCATGGAGGAGCGGGCACAGCTGGAAGCACACCTGGGGCAGGTAATGGAGTCG 

GTTAGACAACTACAAATGGAGAGAGATAAATATGCGGAGAATCTCAAAGGAGAGAGCGCCATG 

TGGCGGCAGAGGATGCAGCAGATGTCAGAGCAGGTGCACACATTGAGAGAGGAGAAGGAATGT 

AGCATGAGTCGGGTACAGGAGCTGGAGACGAGCTTGGCTGAACTGAGGAACCAGATGGCTGAA 

CCCCCGCCCCCAGAGCCCCCAGCAGGGCCCTCCGAGGTGGAGCAGCAGCTACAAGCGGAGGCT 

GAG C ACC TG CGG AAGGAG CTGG AGGG T CT GG C AGG AC AG CTT C AAGC C C AGG TG C AAG AC AA T 

G AGGG CT TG AGTCGC CTG AACCGGG AG C AGG AGG AG AGG CTG C TGG AG CTGG AG CGGG CG G C C 

GAGCTCTGGGGGGAGCAGGCGGAGGCGCGCAGGCAAATCCTGGAGACCATGCAGAACGACCGC 

ACTACCATC AGCCGCGCACTCTCCCAG AACCGGG AGCTCAAGG AG CAGCTGGCTGAGCTGC AG 

AGCGGATTTGTAAAGCTGACTAATGAGAACATGGAGATCACCAGCGCACTGCAGTCGGAGCAG 

C ACGT C AAG AGGG AG CTG GG AAAG AAG CTGGG CG AGC TG C AGG AG AAG C TG AG CG AG C TG AAG 

GAAACGGTGGAGCTGAAGAGCCAAGAGGCTCAAAGTCTGCAGACCCAGCTCGTGGACCAGCTG 

CAGCAGCAGGAAGCTCAGGGCAAAGCGGTGGCCGAGATGGCCCGCCAAGAGTTGCAGGAAACC 

CAGGAGCGCCTGGAAGCTGCCACCCAGCAGAATCAGCAGCTACGGGCCCAGTTGAGCCTCATG 

GCTCACCCTGGGGAAGGAGATGGACTGGACCGGGAGGAGGAGGAGGATGAGGAGGAGGAGGAG 

GAGGAGGCGGTGGCAGTACCTCAGCCCATGCCAAGCATCCCGGAGGACCTGGAGAGCCGGGAA 

GCCATGGTGGCATTTTTCAACTCAGCTGTAGCCAGTGCCGAGGAGGAGCAGGCAAGGCTACGT 

GGGCAGCTGAAGGAGCAAAGGGTGCGCTGCCGGCGCCTGGCTCACCTGCTGGCCTCGGCCCAG 

AAGGAGCCTGAGGCAGCAGCCCCAGCCCCAGGGACCGGGGGTGATTCTGTGTGTGGGGAGACC 

CACCGGGCCCTGCAGGGGGCCATGGAGAAGCTGCAGAGCCGCTTTATGGAGCTCATGCAGGAG 

AAGGCAGACCTGAAGGAGAGGGTAGAGGAACTGGAACATCGCTGCATCCAGCTTTCTGGAGAG 

ACAGACACCATTGGAGAGTACATTGCACTGTACCAGAGCCAGAGGGCAGTGCTGAAGGAGCGG 

CACCGGGAGAAGGAGGAGTACATCAGCAGGCTGGCCCAAGACAAGGAGGAGATGAAGGTGAAG 
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1 

t 

1 

{ 


CTGCTGGAGCTGCAGGAGCTGGTCTTACGGCTTGTGGGCGACCGCAACGAGTGGCATGGCAGA 
TTCCTGGCAGCTGCCCAGAACCCTGCTGATGAGCCCACTTCAGGGGCCCCAGCCCCCCAGGAA 
CTTGGGGCTGCCAACCAGCAGGGTGATCTTTGCGAGGTGAGCCTCGCCGGCAGTGTGGAGCCT 
GCCCAAGGAGAGGCCAGGGAGGGTTCTCCCCGTGACAACCCCACTGCACAGCAGATCATGCAG 
CTGCTTCGTGAGATGCAGAACCCCCGGGAGCGCCCAGGCTTGGGCAGCAACCCCTGCATTCCT 
TTTTTTTACCGGGCTGACGAGAATGATGAGGTGAAGATCACTGTCATCTAA 


L_ , • , „!..■■ . .. 


ORF Start: ATG at 16 j 


ORF Stop: TAA at 2884 


j 


SEQIDNO:148 |956aa jMW at 10759 1.OkD 


iNOV36b, 

jCG 120 166-02 Protein 

-.Sequence 

i 
! 
1 

| 

i 

; 
I 
{ 


MSEETRQSKLAAAKKKLREYQQRNSPGVPTGAKKKKKIKNGSNPETTTSGGCHSPEDTPKDNA 
ATLQPSDDTVLPGGVPSPGASLTSMAASQNHDADNVPNLMDETKTFSSTESLRQLSQQLNGLV 
CESATCVNGEGPASSANLKDLESRYQQLAVALDSSYVTNKQLNITIEKLKQQNQEITDQLEEE 
KKECHQKQGALREQLQVH IQT IG I LVSEKAELQTALAHTQHAARQKEGESEDL AS RLQYS RRR 
VGELERALSAVSTQQKKADRYNKELTKERDALRLELYKNTQSNEDLKQEKSELEEKLRVLVTE 
KAGMQLNLEELQKKLEMTELLLQQFSSRCEAPDANQQLQQAMEERAQLEAHLGQVMESVRQLQ 
MERDKYAENLKGESAMWRQRMQQMSEQVHTLREEKECSMSRVQELETSLAELRNQMAEPPPPE 
PPAGPSEVEQQLQAEAEHLRKELEGLAGQLQAQVQDNEGLSRLNREQEERLLELERAAELWGE 
QAEARRQILETMQNDRTTISRALSQNRELKEQLAELQSGFVKLTNENMEITSALQSEQHVKRE 
LGKKLGELQEKLSELKETVELKSQEAQSLQTQLVDQLQQQEAQGKAVAEMARQELQETQERLE 
AATQQNQQLRAQLSLMAHPGEGDGLDREEEEDEEEEEEEAVAVPQPMPSIPEDLESREAMVAF 
FNSAVASAEEEQARLRGQLKEQRVRCRRLAHLLASAQKEPEAAAPAPGTGGDSVCGETHRALQ 
GAMEKLQSRFMELMQEKADLKERVEELEHRCIQLSGETDTIGEYIALYQSQRAVLKERHREKE 
EYISRLAQDKEEMKVKLLELOELVLRLVGDRNEWHGRFLAAAQNPADEPTSGAPAPQELGAAN 
QQGDLCEVSLAGSVEPAQGEAREGSPRDNPTAQQIMQLLREMQNPRERPGLGSNPCIPFFYRA 
DENDE VK I TV I 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 36B. 



j Table 36B. Comparison of NOV36a against NOV36b. 


! 

j Protein Sequence 

I 


NOV36a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


| NOV36b 

i 


1..990 
1..956 


790/990 (79%) 
790/990 (79%) 



5 

Further analysis of the NOV36a protein yielded the following properties shown in 
Table 36C. 



^Tablc 36C. Protein Sequence Properties NOV36a 


jPSort 
analysis: 

i 


0.6000 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


: SignalP 
; analysis: 


No Known Signal Sequence Predicted 



10 A search of the NOV36a protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 36D. 
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j Table 36D. Ceneseq Results for NOV36a 


i 

i 
i 

; Geneseq 
1 Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV36a 
Residues/ 
Match 
Residues 


i 

I Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM93868 


Human polypeptide* SEQ ID NO: 
3973 - Homo sapiens, 360 aa. 
[EP1 130094-A2, 05-SEP-2001] 


88..438 
1..351 


348/351 (99%) 
348/351 (99%) 


0.0 


ABG20674 


Novel human diagnostic protein 
#20665 - Homo sapiens, 1262 aa. 
[WO200175067-A2, ll-OCT-2001] 


148..960 
578..1261 


360/842 (42%) 
466/842 (54%) 


e-146 


ABG20672 


Novel human diagnostic protein 
#20663 - Homo sapiens, 1 7 1 7 aa. 
[WO2001 75067- A2, 1 l-OCT-2001] 


128..840 
1038..I632 


269/738 (36%) 
364/738 (48%) 


6e-89 


jABG03395 

r 

] 


Novel human diagnostic protein 
#3386 - Homo sapiens, 633 aa. 
(WO2001 75067- A2, 1 l-OCT-2001] 


294..633 
55..403 


181/367 (49%) 
228/367 (61%) 


2e-78 


ABG20896 


Novel human diagnostic protein 
#20887 - Homo sapiens, 1 2 1 3 aa. 
[ WO200 1 75067- A2, 1 1 -OCT-200 1 ] 


503..831 
332.-635 


155/332 (46%) 
199/332(59%) 


6e-62 



In a BLAST search of public sequence datbases, the NOV36a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 36E. 

5 _ ^ 

(Table 36E. Public BLASTP Results for NOV36a 



( — 

Protein 

Accession 

Number 


Protein/Organism/Length 


NOV36a 
Residues/ 
Match 
Residues 


i 

| Identities/ 

| Similarities for the 

! Matched Portion 

i 


Expect 
Value 


Q9NYF9 


Golgi matrix protein GM130 - 
Homo sapiens (Human), 990 aa. 


1..990 
I..990 


990/990(100%) 
990/990(100%) 


0.0 


Q62839 


Cis-golgi matrix protein GM130 - 
Rattus norvegicus (Rat), 986 aa. 


1..990 
I..986 


748/1003 (74%) 
841/1003 (83%) 


0.0 


Q08379 


GoIgin-95 - Homo sapiens 
(Human), 620 aa. 


371. .990 
I..620 


620/620(100%) 
620/620(100%) 


0.0 


Q92IM4 


Unknown (protein for MGC:1 1816) 
- Mus musculus (Mouse), 6 1 7 aa. 


371. .990 
1..617 


466/627 (74%) 
522/627 (82%) 


0.0 


Q9BRB0 


Hypothetical 38.8 kDa protein - 
Homo sapiens (Human), 345 aa. 


5I6..8I3 
1..297 


296/298 (99%) 
296/298 (99%) 


e-160 
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PFam analysis predicts that the NOV36a protein contains the domains shown in the 
Table 36F. 



: Table 36F. Domain Analysis of NOV36a 



1 1 

Pfam Domain 


NOV36a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 






5 Example 37. 

The NOV37 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 37A. 



jTable 37A. NOV37 Sequence Analysis 

| JSEQ1DN0:149 |2490 bp j 


jNOV37a, 

jCG 12040 1-01 DNA 
Sequence 

i 
i 

! 

i 
1 

i 

! 

i 
J 

i 
j 

1 


AAAAGGGCATTGAGACCNTGGGAGGTGACTTCATCAATTGCTCNTACCCCGGAGTTCTTAGAC 


AG ACTTGG ACCN ATC CC CTACCTTGGGTGTG C AC TT CTTGGG AAACGG AGGGG C CNG AGGG T C 


CGAATCTCNTCTTCAGCCTTTTAAGCTTCACTTGGTCAGAATCCTTGGATGAGCNTGTGGGAC 


CGTTCCTCATAGCCCGGTGGTTTGANCCAGTGGCTTTGGGACTGTAAGAGGATGGACAAAGGT 


AGTATGAGGCCGGGACAGCTCCCCAAATGCTCGGGCGCCAAGCATCTGCTGATTCCACTCTGT 


GCCTGGCACGTGTGTCTTTCGGTCCTGGCTAGGCCTTGCCACCTCCCCCTGGGCCAGCACCAC 


CCATGTACCTGTCGGCAGAGCCACAGTTGACTGACTTGCTGTTGAGCTGCTGCTCTTTAGCAC 


GGCTGCATTTTGATGGCAGAGGGAGCAGTTATGCCCAGAGCCCAGCGTGTCGGCATCACCAGT 


CTGACCTGAGCACCATGTGCTGCACGACATGACACGTGGCACCGGGGGAACTGCCCAGCGTGG 


AAGGTCTGGGCCAGGTCTGAGCCCAGATGGGATCTGGATGGCCAAGGAACTCTACCTTAAAAC 
CTCAAGTGTCAAGGAGGCAGGGGAGGGGCCAAGAGGGCTGGCAGGGGAAGGGGGCTGGGGAGG 
GGTCCCATTTGCAGAGGCTCTTAGGATCTTAGGAGGCCCGAATCCCACCATTTCTCTACTTGG 
TAGATCTCAGGGGCTGCTAGATTCATCCCTGATGGCATCAGGCACTGCCAGCCGCTCAGAGGA 
TGAGGAGTCACTGGCAGGGCAGAAGCGAGCCTCCTCCCAGGCCTTGGGCACCATCCCTAAACG 
GAGAAGCTCCTCCAGGTTCATCAAGAGGAAGAAGTTCGATGATGAGCTGGTGGAGAGCAGCCT 
GGCAAAATCTTCTACCCGGGCAAAGGGGGCCAGTGGGGTGGAACCAGGGCGCTGTTCGGGGAG 
TGAACCCTCCTCCAGTGAGAAGAAGAAGGTATCCAAAGCCCCCAGCACTCCTGTGCCACCCAG 
CCC AGCC CC AGCC CC TGG ACT C ACC AAGCGTGTG AAG AAG AGT AAAC AG CC ACTT C AG GTG AC 
CAAGGATCTGGGCCGCTGGAAGCCTGCAGATGACCTCCTGCTCATAAATGCTGTGTTGCAGAC 
CAACGACCTGACCTCCGTCCACCTGGGCGTGAAATTCAGCTGCCGCTTCACCCTTCGGGAGGT 
CCAGGAGCGTTGGTACGCCCTGCTCTACGATCCTGTCATCTCCAAGTTGGCCTGTCAGGCCAT 
GAGGCAGCTGCACCCAGAGGCTATTGCAGCCATCCAGAGCAAGGCCCTGTTTAGCAAGGCTGA 
GGAGCAGCTGCTGAGCAAAGTGGGATCGACCAGCCAGCCCACCTTGGAGACCTTCCAGGACCT 
GCTGCACAGACACCCTGATGCCTTCTACCTGGCCCGTACCGCGAAGGCCCTGCAGGCCCACTG 
GCAGCTCATGAAGCAGTATTACCTGCTGGAGGACCAGACAGTGCAGCCGCTGCCCAAAGGGGA 
CCAAGTGCTGAACTTCTCTGATGCAGAGGACCTGATTGATGACAGTAAGCTCAAGGACATGCG 
AG ATG AG GTC CTGG AAC ATG AG CTG ATGGTGGCTG ACCG G CG CC AG AAG CG AG AG ATT CGG C A 
GCTGGAACAGGAACTGCATAAGTGGCAGGTGCTAGTGGACAGCATCACAGGCATGAGCTCTCC 
GGACTTCGACAACCAGACACTGGCAGTGCTGCGGGGCCGCATGGTGCGGTACCTGATGCGCTC 
GCGTGAGATCACCCTGGGCAGAGCAACCAAGGATAACCAGATTGATGTGGACCTGTCTCTGGA 
GGGTCCGGCCTGGAAGATATCCCGGAAACAAGGTGTCATCAAGCTGAAGAACAACGGTGATTT 
CTTCATTGCCAATGAGGGTCGACGGCCCATCTACATCGATGGACGGCCGGTGCTCTGTGGCTC 
CAAATGGCGCCTCAGCAACAACTCTGTGGTGGAGATCGCCAGCCTGCGATTCGTCTTCCTTAT 
CAACCAGGACCTCATTGCCCTCATCAGGGCTGAGGCTGCCAAGATCACACCACAGTGAGGAGT 
GGTGGC AGG ACTCGTGGG CCCTCTCC GG CCTGTTTC CCC TG CC AC T CCAG C CC CCTTG AG CTG 


GGAACTCAGGCTCCTGGAAAAACCTGGGCAGTGGGAGGCTCAGCTGCGGGCCATTGATTTGAG 


CCTTTGAGGGAGGATAGGGCTGGCCTTTGTGAAGCCAGCAGAGGCTGAGAACCTCAGGCTTCC 


CT AG ATC C AG AG CCC CTC CCC ATC TT CCTCT CT C T AAAAACAACC CT AC C C C C C ATTG C C A CC 


TTCACTCCTGTGTCTCCAGCTGATTAGCCTCAGACTCTTCTTTTATTGTTTTTCTTTTGTAAA 
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TAAAAAGCACCAGGTTCAAAAAAAAAAAAAAAA 




ORF Start: ATG at 533 7 jORF Stop: TGA at 2 135 




SEQIDNO:150 |534 aa |MW at 590l6.8kD 


NOV37a, 

CGI 2040 1-01 Protein 
Sequence 


MTRGTGGTAQRGRSGPGLSPDGIWMAKELYLKTSSVKEAGEGPRGLAGEGGWGGVPFAEALRI 
LGGPNPTISLLARSQGLLDSSLMASGTASRSEDEESLAGQKRASSQALGTI PKRRSSSRFI KR 
KKFDDELVESSLAKSSTRAKGASGVEPGRCSGSEPSSSEKKKVSKAPSTPVPPSPAPAPGLTK 
RVKKSKQPLQVTKDLGRWKPADDLLLINAVLQTMDLTSVHLGVKFSCRFTLREVQERWYALLY 
DPVISKLACQAMRQLHPEAIAAIQSKALFSKAEEQLLSKVGSTSQPTLETFQDLLHRHPDAFY 
LARTAKALQAHWQLMKQYYLLEDQTVQPLPKGDQVLNFSDAEDLIDDSKLKDMRDEVLEHELM 
VADRRQKREIRQLEQELHKWQVLVDSITGMSSPDFDNQTLAVLRGRMVRYLMRSREITLGRAT 
KDNQIDVDLSLEGPAWKISRKQGVIKLKNNGDFFIANEGRRPIYIDGRPVLCGSKWRLSNNSV 
VEIASLRFVFLINQDLIALIRAEAAKITPQ 




SEQ ID NO: 151 


1764 bp 


NOV37b, 

CG 120401 -02 DNA 
Sequence 


TACCCCGAGAGTTCTTAGACAGACTTGGACCGATCCCCTACCTTGGGTGTGCACTCTTGGGAG 


AACGAGGGGCGGAGGGTCCGAATCTCCTCTCAGCCTTTAAGCTCACCTGGTCAGAATCCTTGG 


ATGAGCCTGTGGGACCGTTCCTCCTAGCCCGGTGGTTTGGAACCAGTGGCTTTGGGACTGTAA 


GAGGATGGACAAAGATTCTCAGGGGCTGCTAGATTCATCCCTGATGGCATCAGGCACTGCCAG 
CCGCTCAGAGGATGAGGAGTCACTGGCAGGGCAGAAGCGAGCCTCCTCCCAGGCCTTGGGCAC 
CATCCCTAAACGGAGAAGCTCCTCCAGGTTCATCAAGAGGAAGAAGTTCGATGATGAGCTGGT 
GG AG AG C AG C CTGGC AAAATC TT CT ACCCGGGC AAAGGGGGC CAG TGGGGTGG AA C C AG G G CG 
CTGTTCGGGGAGTGAACCCTCCTCCAGTGAGAAGAAGAAGGTATCCAAAGCCCCCAGCACTCC 
TGTGCCACCCAGCCCAGCCCCAGCCCCTGGACTCACCAAGCGTGTGAAGAAGAGTAAACAGCC 
ACTTCAGGTGACCAAGGATCTGGGCCGCTGGAAGCCTGCAGATGACCTCCTGCTCATAAATGC 
TGTGTTGCAGACCAACGACCTGACCTCCGTCCACCTGGGCGTGAAACTCAGCTGCCGCTTCAC 
CCTTCGAGAGGTCCAGGAGCGTTGGTACGCCCTGCTCTACGATCCTGTCATCTCCAAGTTGGC 
CTGTCAGGCCATGAGGCAGCTGCACCCAGAGGCTATTGCAGCCATCCAGAGCAAGGCCCTGCA 
GGCCCACTGGCAGCTCATGAAGCAGTATTACCTGCTGGAGGACCAGACAGTGCAGCCGCTGCC 
CAAAGGGGACCAAGTGCTGAACTTCTCTGATGCAGAGGACCTGATTGATGACAGTAAGCTCAA 
GGACATGCGAGATGAGGTCCTGGAACATGAGCTGATGGTGGCTGACCGGCGCCAGAAGCGAGA 
GATTCGGCAGCTGGAACAGGAACTGCATAAGTGGCAGGTGCTAGTGGACAGCATCACAGGCAT 
GAGCTCTCCGGACTTCGACAACCAGACACTGGCAGTACTGCGGGGCCGCATGGTGCGGTACCT 
GATGCGCTCGCGTGAGATCACCCTGGGCAGAGCAACCAAGGATAACCAGATTGATGTGGACCT 
GTCTCTGGAGGGTCCGGCCTGGAAGATATCCCGGAAACAAGGTGTCATCAAGCTGAAGAACAA 
CGGTGATTTCTTCATTGCCAATGAGGGTCGACGGCCCATCTACATCGATGGACGGCCGGTGCT 
CTGTGGCTCCAAATGGCGCCTCAGCAACAACTCTGTGGTGGAGATCGCCAGCCTGCGATTCGT 
CTTCCTTATCAACCAGGACCTCATTGCCCTCATCAGGGCTGAGGCTGCCAAGATCACACCACA 
GTGAGGAGTGGTGGCAGGACTCGTGGGCCCTCTCCGGCCTGTTTCCCCTGCCACTCCAGCCCC 


CTTGAGCTGGGAACTCAGGCTCCTGGAAAAACCTGGGCAGTGGGAGGCTCAGCTGCGGGCCAT 


TGATTTGAGCCTTTGAGGGAGGATAGGGCTGGCCTTTGTGAAGCCAGCAGAGGCTGAGAACCT 


CAGGCTTCCCTAGATCCAGAGCCCCTCCCCATCTTCCTCTCTCTAAAAACAACCCTACCCCCC 


ATTGCCACCTTCACTCCTGTGTCTCCAGCTGATTAGCCTCAGACTCTTCTTTTATTGTTTTTC 




ORF Start: ATG at 194 j 


ORF Stop: TGA at 1451 




SEQ ID NO: 152 |4!9aa MW at 46940.2 kD 


NOV37b, 

CGI 2040 1-02 Protein 
Sequence 


MDKDSQGLLDSSLMASGTASRSEDEESLAGQKRASSQALGTIPKRRSSSRFIKRKKFDDELVE 
SSLAKSSTRAKGASGVEPGRCSGSEPSSSEKKKVSKAPSTPVPPSPAPAPGLTKRVKKSKQPL 
QVTKDLGRWKPADDLLLINAVLQTNDLTSVHLGVKLSCRFTLREVQERWYALLYDPVISKLAC 
QAMRQLHPEAIAAIQSKALQAHWQLMKQYYLLEDQTVQPLPKGDQVLNFSDAEDLIDDSKLKD 
MRDEVLEHELMVADRRQKREIRQLEQELHKWQVLVDSITGMSSPDFDNQTLAVLRGRMVRYLM 
RSREITLGRATKDNQIDVDLSLEGPAWKISRKQGVIKLKNNGDFFIANEGRRPIYIDGRPVLC 
GSKWRLSNNSWEIASLRFVFLINQDLIALIRAEAAKITPQ 




SEQ ID NO: 153 


1914 bp 




NOV37c, 


CGCGGAGAAATTGTTGGATCTGGCAGTCTAGGAATGAATCTCCTCTCAGCCTTTAAGCTCACC 


CGI 2040 1-03 DNA 


TGGTCAGAATCCTTGGATGAGCCTGTGGGACCGTTCCTCCTAGCCCGGTGGTTTGGAACCAGT 


GGCTTTGGGACTGTAAGAGGATGGACAAAGATTCTCAGGGGCTGCTAGATTCATCCCTGATGG 
C ATC AGG CACTG CC AG CCG CTC AG AGG ATG AGG AG TC ACTGG C AGGGC AG AAG CG AG C CTC CT 
CCCAGGCCTTGGGCACCATCCCTAAACGGAGAAGCTCCTCCAGGTTCATCAAGAGGAAGAAGT 
TCG ATG ATG AG CTG GTGG AG AG C AGC CTGGC AAAATCTTCT ACCCGGG C AAAGGG G G C C AGTG 
3GGTGGAACCAGGGCGCTGTTCGGGGAGTGAACCCTCCTCCAGTGAGAAGAAGAAGGTATCCA 
AAGCCCCCAGCACTCCTGTGCCACCCAGCCCAGCCCCAGCCCCTGGACTCACCAAGCGTGTGA 
AGAAGAGTAAACAGCCACTTCAGGTGACCAAGGATCTGGGCCGCTGGAAGCCTGCAGATGACC 
rCCTGCTCATAAATGCTGTGTTGCAGACCAACGACCTGACCTCCGTCCACCTGGGCGTGAAAT 
rCAGCTGCCGCTTCACCCTTCGGGAGGTCCAGGAGCGTTGGTACGCCCTGCTCTACGATCCTG 
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i 

t 

\ 

i 
i 

S 

i 
1 

! 
! 

1 


TCATCTCCAAGTTGGCCTGTCAGGCCATGAGGCAGCTGCACCCAGAGGCTATTGCAGCCATCC 
AG AGC AAGG C CCTGTTT AGC AAGG CT G AGG AG C AG CTGCTG AGCAAAGTGGG ATCG AC C AG CC 
AGCCCACCTTGGAGACCTTCCAGGACCTGCTGCACAGACACCCTGATGCCTTCTACCTGGCCC 
GTACCGCGAAGGCCCTGCAGGCCCACTGGCAGCTCATGAAGCAGTATTACCTGCTGGAGGACC 
AGACAGTGCAGCCGCTGCCCAAAGGGGACCAAGTGCTGAACTTCTCTGATGCAGAGGACCTGA 
TTGATGACAGTAAGCTCAAGGACATGCGAGATGAGGTCCTGGAACATGAGCTGATGGTGGCTG 
ACCGGCGCCAGAAGCGAGAGATTCGGCAGCTGGAACAGGAACTGCATAAGTGGCAGGTGCTAG 
TGGACAGCATCACAGGCATGAGCTCTCCGGACTTCGACAACCAGACACTGGCAGTGCTGCGGG 
GCCGCATGGTGCGGTACCTGATGCGCTCGCGTGAGATCACCCTGGGCAGAGCAACCAAGGATA 
ACCAGATTGATGTGGACCTGTCTCTGGAGGGTCCGGCCTGGAAGATATCCCGGAAACAAGGTG 
TCATCAAGCTGAAGAACAACGGTGATTTCTTCATTGCCAATGAGGGTCGACGGCCCATCTACA 
TCGATGGACGGCCGGTGCTCTGTGGCTCCAAATGGCGCCTCAGCAACAACTCTGTGGTGGAGA 
TCGCCAGCCTGCGATTCGTCTTCCTTATCAACCAGGACCTCATTGCCCTCATCAGGGCTGAGG 
CTGCCAAGATCACACCACAGTGAGGAATGGTGGCAGGACTCGTGGGCCCTCTCCGGCCTGTTT 


CCCCTGCCACTCCAGCCCCCTTGAGCTGGGAACTCAGGCTCCTGGAAAAACCTGGGCAGTGGG 


AGGCTCAGCTGCGGGCCATTGATTTGAGCCTTTGAGGGAGGATAGGGCTGGCCTTTGTGAAGC 


CAGCAGAGGCTGAGAACCTCAGGCTTCCCTAGATCCAGAGCCCCTCCCCATCTTCCTCTCTCT 


AAAAACAACCCTACCCCCCATTCTACCCCCCATTGCCACCTTCACTCCTGTGTCTCCAGCTGA 


TTAGCCTCAGACTCTTCTTTTATTGTTTTTCTTTTGTAAATAAAAAGCACCAGGTTCCAAAGT 


AAAAAAAAAAAAAAAAAACTCGAG 




ORF Start: ATG at 147 jORF Stop: TGA at 1 533 




SEQ ID NO: 154 462 aa |MW at 5I802.6RD 


;NOV37c, 

;CG 12040 1-03 Protein 
Sequence 


MDKDSQGLLDSSLMASGTASRSEDEESLAGQKRASSQALGTI PKRRSSSRFI KRKKFDDELVE 
SSLAKSSTRAKGASGVEPGRCSGSEPSSSEKKKVSKAPSTPVPPSPAPAPGLTKRVKKSKQPL 
QVTKDLGRWKPADDLLLINAVLQTNDLTSVHLGVKFSCRFTLREVQERWYALLYDPVISKLAC 
QAMRQLHPEAIAAIQSKALFSKAEEQLLSKVGSTSQPTLETFQDLLHRHPDAFYLARTAKALQ 
AHWQLMKQYYLLEDQTVQPLPKGDQVLNFSDAEDLIDDSKLKDMRDEVLEHELMVADRRQKRE 
IRQLEQELHKWQVLVDSITGMSSPDFDNQTLAVLRGRMVRYLMRSREITLGRATKDNQIDVDL 
SLEGPAWKISRKQGVIKLKNNGDFFIANEGRRPIYIDGRPVLCGSKWRLSNNSWEIASLRFV 
FL I NQDL I AL I RAE AAK I T PQ 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 37B. 



! Table 37B. Comparison of NOV37a against NOV37b and NOV37c. 



I 

; Protein Sequence 


NOV37a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV37b 


77..534 
5..419 


371/458 (81%) 
371/458 (81%) 


jNOV37c 
1 


77..534 
5..462 


415/458 (90%) 
415/458(90%) 



5 

Further analysis of the NOV37a protein yielded the following properties shown in 
Table 37C. 



Table 37C. Protein Sequence Properties NOV37a 


PSort 
analysis: 


0.9600 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 
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SignalP 
analysis: 



No Known Signal Sequence Predicted 



A search of the NOV37a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 37D. 



j Table 37D. Geneseq Results for NOV37a 



1 

Geneseq 
Identifier 


Protein/Organism/Length (Patent #, 
Date] 


NOV37a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAY77555 

: 
1 

j 


Human MIF1 protein (plasmid 
pCM577) - Homo sapiens, 462 aa. 
[WO200005362-A1, O3-FEB-2000] 


77..534 
5. .462 


458/458 (100%) 
458/458 (100%) 


0.0 


j AAY77554 

! 
i 


Human MIF1 protein (plasmid 
pCM480) - Homo sapiens, 390 aa. 
[WO200005362-A1, O3-FEB-2000] 


I50..534 
6..390 


384/385 (99%) 
385/385 (99%> 


0.0 


ABB57874 


Drosophila melanogaster polypeptide 
SEQIDN0 414-Drosophi!a 
melanogaster, 578 aa. [WO200 171042- 
A2,27-SEP-2001] 


I60..530 
204..572 


209/375 (55%) 
268/375 (70%) 


e-Ill 


AAYI2243 


Human 5' EST secreted protein SEQ j 
ID NO: 556 - Homo sapiens, 42 aa. i 
[WO9906554-A2, ! l-FEB-1999] j 


77.. 113 
5. .41 


37/37(100%) 
37/37(100%) 


le-l 1 
9e-IO 


1 AAG40260 


Arabidopsis thaliana protein fragment i 406. .52 1 
SEQ ID NO: 49928 - Arabidopsis j 328. .443 
thaliana, 465 aa. [EP1033405-A2, 06- 
SEP-2000] 


36/116(31%) 
64/116(55%) 



In a BLAST search of public sequence datbases, the NOV37a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 37E. 



Table 37E. Public BLASTP Results for NOV37a 


Protein 

Accession 

Number 


Protcin/Organism/Length 


NOV37a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


075497 


Cell cycle-regulated factor p78 - 
Homo sapiens (Human), 534 aa. 


1..534 
1..534 


534/534(100%) 
534/534(100%) 


0.0 


Q96EZ8 


Unknown (protein for MGC: 19577) 
- Homo sapiens (Human), 462 aa. 


77..S34 
5..462 


458/458(100%) 
458/458(100%) 


0.0 
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014742 


Nucleolar protein - Homo sapiens 
(Human), 462 aa. 


77..534 
5-462 


457/458 (99%) 
457/458 (99%) 


0.0 


Q99L90 


Similar to microspherule protein 1 - 
Mus musculus (Mouse), 462 aa. 


77..534 
5..462 


450/458 (98%) 
451/458(98%) 


0.0 


035255 


Nucleolar protein - Mus musculus 
(Mouse), 462 aa. 


77..534 
5. .462 


444/458 (96%) 
450/458 (97%) 


0.0 



PFam analysis predicts that the NOV37a protein contains the domains shown in the 
Table 37F. 



Table 37F. Domain Analysis of NOV37a 


1 

Pfam Domain j NOV37a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


FHA I435..508 


16/82 (20%) 
62/82 (76%) 


6.2e-l4 



Example 38. 

The NOV38 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 38A. 



Table 38A. NOV38 Sequence Analysis 



SEQIDNO: 155 



5142 bp 



NOV38a ? 

CGI 22 1 25-01 DNA 
Sequence 



{ CCTGATGTGCAAGTACCAACATCTGTAAAAGAT ATQCGCTATTGCCAGGTTTCATTCCAAGAT 



GATCATGTGTCTTTGGAAAGTGCGTTTACAGTAAGACCACTTCCTGATGAACCTAAACATTTA 
AAATGTGAAATGAAAGGAGGAAAAACAGTACAGATGGGCCAAGAGCTTCAAGGAGAAGTAGTT 
ATAATAATTACAGATCAGTACGGAAATCAGATTCAAGCATTTTCACCAAGTTCTTTATCTTCT 
TTGTC AATTGCTGGGGTTGG ACTTG AT AG C T C AAATTTG AAAACAACCTTTC AGG AAAAC A C A 
C AG AGT AT AAGTG T AAG AGGC AT C AAATT T ATTCC AGGTCCTC CTGG AAAT AAGG ATC TT TG T 
TTTACTTGGCGTGAGTTTTCTGACTTTATTCGAGTGCAACTAATTTCTGGACCTCCTGCTAAA 
CTTCTCCTTATAGACTGGCCAGAACTAAAGGAGTCCATTCCAGTGATTGGAAGAGATTTACAG 
AACCCTATTATTGTTCAACTTTGTGATCAGTGGGATAATCCAGCACCGGTACAACATGTTAAA 
ATAAGTCTTACAAAAGCTAGCAATTTAAAGCTCATGCCTTCAAACCAACAGCATAAAACAGAT 
GAGAAAGGCAGGGCTAATTTGGGAGTATTCAGTGTTTTTGCCCCTAGGGGAGAGCATACTCTT 
CAGGTTAAAGCCATCTATAACAAAAGTATCATAGAAGGACCTATAATTAAGTTAATGATTCTT 
CC AG ACC C AG AAAAACCCGTTCGTCTC AATGT T AAAT ATG AC AAAG ATG C AT CC T TC T T AG C A 
GGGGGTCTTTTCACTGATTTTATGATTAGTGTTATTTCTGAAGATGACAGTATCATTAAAAAC 
ATTAATCCAGCACGTATTTCCATGAAAATGTGGAAGCTGTCTACCAGTGGGAACCGACCCCCA 
GCAAATGCAGAAACATTTAGTTGTAATAAAATAAAAGATAATGACAAAGAAGATGGCTGCTTC 
TATTTCAGGGATAAAGTAATTCCTAATAAAGTGGGGACATATTGTATCCAGTTTGGTTTTATG 
ATGGATAAAACAAATATTCTCAACAGTGAACAGGTTATAGTTGAAGTCCTGCCTAATCAACCT 
GTGAAGTTAGTACCTAAAATTAAACCACCTACACCAGCTGTTTCAAATGTTCGCTCAGTTGCC 
AGTAGGACCTTGGTCAGAGATCTACATCTTAGTATCACGGATGACTACGACAACCATACTGGA 
ATTGATTTGGTTGGCACTATAATAGCCACCATTAAAGGCTCTAATGAGGAAGATACTGATACC 
CCACTTTTTATTGGGAAAGTTAGAACACTTGAATTCCCCTTCGTGAATGGTTCGGCTGAAATC 
ATGAGTCTGGTGCTGGCAGAAAGTAGTCCTGGAAGGGATAGTACTGAATATTTTATTGTATTT 
GAGCCCCGGCTACCACTTTTATCAAGAACCTTAGAACCATATATCCTACCGTTCATGTTTTAC 
AATG ATGTTAAGAAG C AG C AAC AAATGGC AG CAC TT AC AAAAG AAAAGG AC C AAT T ATCT C AG 
TCTATTGTTATGTATAAAAGTTTATTTGAAGCCAGCCAACAGCTTCTTAATGAAATGAAATGT 
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CAAGTTGAAGAAGCAAGATTAAAAGAGGCCCAATTGCGAAATGAACTAAAAATACATAATATT 
G AC ATTC CTAC AAC AC AAC AGGTGCC AC A C ATTG AAG CACTTCTG AAAAG AAAG CT AT C AG AA 
CAAGAAGAACTGAAGAAAAAACCTAGAAGATCGTGTACTCTTCCAAACTATACTAAAGGCAGT 
GGAGATGTTTTGGGAAAGATTGCACATCTAGCACAAATTGAAGATGATAGAGCTGCGATGGTT 
ATTTCTTGGCATCTGGCAAGTGACATGGACTGTGTAGTCACCCTAACCACTGACGCTGCACGT 
CGT AT CT ATG ATG AAAC CC AAGG TCG TC AGC AGG TG TTG C CCCTTG AT TC T AT T T A C AAG AAG 
ACTCTTCCAG ATTGG AAAAGATC TCT AC C TC ATT TC C G AAATGGAAAATTGT ATTT T AAAC C C 
ATTGGAGATCCAGTCTTTGCTCGAGACTTGTTAACATTTCCAGATAATGTAGAACATTGTGAA 
ACAGTATTTGGTATGCTGTTAGGAGACACCATTATTTTGGATAATCTGGATGCGGCCAATCAT 
TATAGAAAAGAGGTTGTTAAAATTACACACTGTCCTACACTGCTGACCAGAGATGGAGATCGA 
ATTCGAAGTAATGGAAAGTTTGGGGGCCTTCAGAATAAAGCTCCTCCAATGGATAAACTTCGG 
GG AATGGT ATTTGG AGCT CC AGTTC C AAAA C AGT GT C TG AT C TT AGG GG AAC AAAT AG ATC TT 
CTTCAGCAGTATCGTTCTGCTGTGTGCAAACTAGACAGTGTGAATAAGGATCTTAACAGTCAA 
TTAGAGTACCTTCGCACTCCGGATATGAGGAAGAAAAAGCAAGAACTTGATGAACATGAGAAA 
AATCTC AAAC T AAT AG AGG AAAAACT AGG T ATG ACTC CC AT ACGT AAG TGT AATG ACT C AT TG 
CGTCATTCACCAAAGGTTGAGACGACAGATTGTCCAGTTCCTCCTAAAAGAATGAGACGAGAA 
GCTACAAG ACAAAATAGGATT ATAACC AAAAC AG ATG TATGAGAGGTGAC AGAG AGAAGAGG C 



CATTGGTCTCAGTAAGAATGCCCTGCTTTCTGCATCTCTGTTTCAGAAGACCAAGAGGGTGAC 



TTACCAGACTGAGTATTTCTGGGGACAATACAAGTACCTGGGCATGAATTTCCATTTCGATTC 



AGATGGGACTGGAAACAACCATTCAATTTTATGAATCTTACTGGACATTATGGATTTACTGGA 



ATTATTCCAGACATTATGCCCTTTGGTTGTCACTACCTTGCAAATGTGTAAGAGGAAAATGTG 



CTAATGTGGCAGTGACTGTAAAACTGGCACATGGCATTTATTAATCCTGAAGAAAAGTACATG 



TACTATTTTTCAGTATAAATATAATGAACATGTCAGAACTATTTCTTGAAAACCTTTTTATTA 



CTTTTGCGTGAATTTATTTAACAAAGATGTTTTGTCTTTTGTGTAAGGGAGGTTCTAGAGG CT 



AGATGTTTAATTGTAAATATGTGAGGAAACTCAATGCAGAATTCAGGATAAAAATTTT AAAAG 



CACAGGTATTTGGGAATTGAAATGTTAAGATACCCAGAACAACATTAAATCAATGAGTGAACT 



TGTGACAGTGGTAGCATTTCAAATTTCAAAAGACTTATCCTGTGTGTGTGTGTGTGTGTATAT 



AT ATAT ATAT AT ATAAATAT AT ATATAT AAAAT ATT C AGCAG CAC C AAGTTT T AT AAC T AT TG 



TTTGTTTG ACTTT ATT AAT ACT AG AAT ATGT AG TCT C AG CC T TAATTT T AC AT TT AC AT T ATT 



TTGTAATTTTTTATTACTATTTTTAAGGGGTTAAAGAGAACATACATTCTCACATTAGTGTAC 



TTTCTGGTAGAAAGTTGCTGCAAAAACATTTGAAATGTATATTAACCTAATGTATGTCATATA 



TATGTCTTTGTGTAAGTTCAAGACTATTGATCTGTGAAGTTATTTTGTAAGGACATACATTTG 



GTAAGTAAGTTTGTGTCCCAGGAAATGTATGTGTTTTTAAACCCTTTCTAAATATGCAGGCCA 



TTAATAAATAAGATTGTTTCTTCCCTACTGAATAGATAAGTGTTTTTCTTTTTTAAAATTGGA 



AGCTTCATAAAAGTTATCTTGTTAAAAAACGATGATGATGTTAACCTATCTTTATAATTGGAA 



ATTATTTAAACTGTTTGTTGTTACAGAAGAAACAAAATGGTAATTACAGAATTAGCTGTGGGG 



AAGATTGGCTCCCATGGTACTACAGGTTGAGTATCCCTTATTCAAAATGGTTGGGACCTGAAG 



TGTTTTGG ATTTTGGTTT TTTGTGGGGTTTTTAAATT ATTATTTTGG AAT AT T TG CAT TAT AT 



TT ACC AATTT AG C ATC CC T AATC C AAAAATCTTT CAT TAG AAATTGG AAATGC TTC AATG AG C 



ATTTCCCTTATGTGTCATATCTGCGCTCAAATGTTTTGAATTTTGGAACATTTCTTGTTTCTG 
ACTTTTAGATTAGGGATACTCACCCCGTACCAATATTTGGAGTCCTTAGACATTAGATTATAT 



G AAAATG ATTG AT TG AT AGGT AAG AAAGG TT AAAG AG AAATG ACTTT TT AGG AAC TAG AC T TG 



AACGTATAATTAATATGTGGAACTAGTTTGCTTTTATAATTCATTGTATCAAGAAGGAGTCAT 



GTTTTAAATCCAAGGCATTGAGACTATCTAGAAGCAAAAAAGTTGGTTTTAAAAATTCTTTTT 



ATTAGAAGTGTGTTTAAAAACATGTCTTTTTTTCCCTTGAGCGACAGTGGCCTCACTACATTG 



CCCAGGCTAGTCTTGAACTCCTGGGTTCAAGCAGTCTTCCTGCCTTGGCCTCCTAAGT AGCAG 



GGATTACAGGCATGCACCCACATGCCAATTTTTATTGCTTAAAATGGCGAGTTCTATGGATAT 



TATCTAGCCCCAAGCCCATATCTGTACACTTTTGTTCATTTTATAATACCAGAAAGAATGTTC 



TATCAGTAAAAGAAGGTTTTAAGTATGAATCTCCATTTTTGTGGAGTTTTTCTACCCTGTAAA 



AATTAACTTTGTGGTTCCCAGTAATTACAGTAAGGAGTTTTTCTATATTTTCACAGTTGGATT 



TATCTAAGGATATTTCTCAGCATATTAAGGAATCCTTTTCTACATTTGAGCAAATACTGAGGT 



TCATGTTGTACC AAAT AAT AAT AAAT TG C TT TTG TG T TT AAT ATG TAAC ACGT AAG AAC AATT 



GAAATTTTCTTCTAAGATTTAATACTAGTCTTTTAGTATAAAGGAATGTAACTGGAGAAAAAA 



ATTAATGCTACAGCCATTTATCCTGTGTTATGTGTTATATAATCTGAAATTTGTCACAATGTT 



ACAATCAGGAAGTTTTTTTTCTGGTTTCTTCTTCCAAAGG AAAAT AATAGCCTACTTGTATTT 



TG AAGT AACTG AAAT AAAACTCT CT C C AAG C CTTTTTG C 



r 



ORF Start: ATG at 34 



ORF Stop: TGA at 2686 



SEQ ID NO: 156 



884 aa 



MW at 99946.2kD 



NOV38a, 

jCG 122 125-0 1 Protein 

(Sequence 

i 
! 



MRYCQVSFQDDHVSLESAFTVRPLPDEPKHLKCEMKGGKTVQMGQELOGEWIIITDQYGNQI 
QAFSPSSLSSLSIAGVGLDSSNLKTTFQENTQSISVRGIKFIPGPPGNKDLCFTWREFSDFIR 
VQLISGPPAKLLLIDWPELKESIPVIGRDLQNPIIVQLCDQWDNPAPVQHVKISLTKASNLKL 
MPSNQQHKTDEKGRANLGVFSVFAPRGEHTLQVKAIYNKSI IEGPIIKLMILPDPEKPVRLNV 
KYDKDASFLAGGLFTDFMISVISEDDSIIKNINPARISMKMWKLSTSGNRPPANAETFSCNKI 
KDNDKEDGCFYFRDKVIPNKVGTYCIQFGFMMDKTNILNSEQVIVEVLPNQPVKLVPKIKPPT 
PAVSNVRSVASRTLVRDLHLSITDDYDNHTGIDLVGTIIATIKGSNEEDTDTPLFIGKVRTLE 
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FPFVNGSAEIMSLVLAESSPGRDSTEYFIVFEPRLPLLSRTLEPYILPFMFYNDVKKQQQMAA 
LTKEKDQLSQSIVMYKSLFEASQQLLNEMKCQVEEARLKEAQLRNELKIHNIDIPTTQQVPHI 
EALLKRKLSEQEELKKKPRRSCTLPNYTKGSGDVLGKIAHLAQIEDDRAAMVISWHLASDMDC 
WTLTTDAARRIYDETQGRQQVLPLDSIYKKTLPDWKRSLPHFRNGKLYFKPIGDPVFARDLL 
TFPDNVEHCETVFGMLLGDTI ILDNLDAANHYRKEWKITHCPTLLTRDGDRI RSNGKFGGLQ 
NKAPPMDKLRGMVFGAPVPKQCLILGEQIDLLQQYRSAVCKLDSVNKDLNSQLEYLRTPDMRK 
KKQELDEHEKNLKLIEEKLGMTPIRKCNDSLRHSPKVETTDCPVPPKRMRREATRQNRIITKT 
DV 



Further analysis of the NOV38a protein yielded the following properties shown in 
Table 38B. 



Table 38B. Protein Sequence Properties NOV38a 


PSort 
analysis: 


0.9600 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV38a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 38C. 



Table 38C. Gcncscq Results for NOV38a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV38a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


ABG14416 


Novel human diagnostic protein 
#14407 - Homo sapiens, 1482 aa. 
[ WO200 1 75067-A2, 1 1 -OCT-200 1 ] 


1..884 
577..1482 


884/906 (97%) 
884/906 (97%) 


0.0 


ABB97256 


Novel human protein SEQ ID NO: 524 
- Homo sapiens, 851 aa. 
[WO200222660-A2, 2 1 -MAR-2002] 


35. .884 
1..851 


850/851 (99%) 
850/851 (99%) 


0.0 


ABG2007I 


Novel human diagnostic protein 
#20062 - Homo sapiens, 848 aa. 
[WO200175067-A2, 1 l-OCT-2001] 


38..884 
1..848 


847/848 (99%) 
847/848 (99%) 


0.0 


AAE01527 


Human gene 1 5 encoded secreted 
protein fragment, SEQ ID NO: 184 - 
Homo sapiens, 646 aa. 
[WO200134626-A1, 17-MAY-2001] 


239..884 
1 ..646 


646/646(100%) 
646/646(100%) 


0.0 


ABG20070 


Novel human diagnostic protein 
#20061 - Homo sapiens, 1300aa. 
[WO200175067-A2, 1 l-OCT-2001] 


123..785 
273..830 


459/663 (69%) 
484/663 (72%) 


0.0 
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In a BLAST search of public sequence datbases, the NOV38a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 38D. 



; Table 38D. Public BLASTP Results for NOV38a 


Protein 
Accession 

, Number 

i 


Protein/Organism/Length 


NOV38a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


075141 


KIAA0650 protein - Homo sapiens 
(Human), 848 aa (fragment). 


S8..884 
1..848 


847/848 (99%) 
847/848 (99%) 


0.0 


Q9D4M7 


493 1400A14Rik protein - Mus 
musculus (Mouse), 887 aa. 


2..877 
3..879 


719/877 (81%) 
809/877 (91%) 


0.0 


•Q9UG39 

f 

i 


Hypothetical 82.6 kDa protein - 
Homo sapiens (Human), 728 aa 
(fragment). 


157..884 
1..728 


727/728 (99%) 
728/728 (99%) 


0.0 


Q9H6Q2 


CDNA:FLJ21993 lis, clone 
HEP06576 - Homo sapiens 
(Human), 267 aa. 


618..884 
1..267 


267/267(100%) 
267/267(100%) 


e-157 


Q8T1V3 


Hypothetical 255.5 kDa protein - 
Dictyostelium discoideum (Slime 
mold), 2284 aa. 


246..644 
1646.. 2070 


1 1 1/455 (24%) 
182/455(39%) 


2e-08 



5 

PFam analysis predicts that the NOV38a protein contains the domains shown in the 



Table 38E. 



Table 38E. Domain Analysis of NOV38a 



! Pfam Domain 



NOV38a Match Region 



Identities/ 
Similarities 

for the Matched Region 



Expect Value 



10 Example 39. 

The NOV39 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 39A. 



Table 39 A. NOV39 Sequence Analysis 




SEQIDNO:157 19195 bp j 


NOV39a, 

CGI 22 195-01 DNA 


ATGGCCAAGTATGGAGAACATGAAGCCAGTCCTGACAATGGGCAGAACGAATTCAGTGATATC 
ATTAAGTCCAGATCTGATGAACACAATGACGTACAGAAGAAAACCTTTACCAAATGGATAAAT 
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Sequence 



GCTCGATTTTCAAAGAGTGGGAAACCACCCATCAATGATATGTTCACAGACCTCAAAGATGGA 



AGGAAGCTATTGGATCTTCTAGAAGGCCTCACAGGAACATCACTGCCAAAGGAACGTGGTTCC 

ACAAGGGTACATGCCTTAAATAACGTCAACAGAGTGCTGCAGGTTTTACATCAGAACAATGTG 

GAATTAGTGAATATAGGGGGAACTGACATTGTGGATGGAAATCACAAACTGACTTTGGGGTTA 

CTTTGGAGCATCATTTTGCACTGGCAGGTGAAAGATGTCATGAAGGATGTCATGTCGGACCTG 

CAGCAGACGAACAGTGAGAAGATCCTGCTCAGCTGGGTGCGTCAGACCACCAGGCCCTACAGC 

CAAGTCAACGTCCTCAACTTCACCACCAGCTGGACAGATGGACTCGCCTTTAATGCTGTCCTC 

CACCGACATAAACCTGATCTCTTCAGCTGGGATAAAGTTGTCAAAATGTCACCAATTGAGAGA 

CTTG AAC ATG CCTT C AGC AAGG CTC AAACTT ATT TG G G AATTG AAAAG CTG TT AG ATC C TG AA 

GATGTTGCCGTTCGGCTTCCTGACAAGAAATCCATAATTATGTATTTAACATCTTTGTTTGAG 

GTGCTACCTCAGCAAGTCACCATAGACGCCATCCGTGAGGTAGAGACACTCCCAAGGAAATAT 

AAAAAAGAATGTGAAGAAGAGGCAATTAATATACAGAGTACAGCGCCTGAGGAGGAGCATGAG 

AGTCCCCGAGCTGAAACTCCCAGCACTGTCACTGAGGTCGACATGGATCTGGACAGCTATCAG 

ATTGCGTTGGAGGAAGTGCTGACCTGGTTGCTTTCTGCTGAGGACACTTTCCAGGAGCAGGAT 

GATATTTCTGATGATGTTGAAGAAGTCAAAGACCAGTTTGCAACCCATGAAGCTTTTATGATG 

GAACTGACTGCACACCAGAGCAGTGTGGGCAGCGTCCTGCAGGCAGGCAACCAACTGATAACA 

CAAGGAACTCTGTCAGACGAAGAAGAATTTGAGATTCAGGAACAGATGACCCTGCTGAATGCT 

AG ATGGG AGG CTCT T AGGGTGG AGAGT ATGG AC AG AC AGTCC CGG C TG C ACG ATG TG C TG ATG 

GAACTGCAGAAGAAGCAACTGCAGCAGCTCTCCGCCTGGTTAACACTCACAGAGGAGCGCATT 

CAGAAGATGGAAACTTGCCCCCTGGATGATGATGTAAAATCTCTACAAAAGCTGCTAGAAGAA 

CATAAAAGTTTGCAAAGTGATCTTGAGGCTGAACAGGTGAAAGTAAATTCACTAACTCACATG 

GTGGTCATTGTTGATGAAAACAGTGGTGAGAGCGCTACAGCTATCCTAGAAGACCAGTTACAG 

AAACTTGGTGAGCGCTGGACAGCAGTATGCCGTTGGACTGAAGAACGCTGGAATAGGTTACAA 

GAAATCAATATATTGTGGCAGGAATTATTGGAAGAACAGTGCTTGTTGAAAGCTTGGTTAACC 

GAAAAAGAAGAGGCTTTAAATAAAGTCCAGACAAGCAACTTCAAAGACCAAAAGGAACTAAGT 

GTCAGTGTTCGACGTCTGGCTATTTTGAAGGAAGACATGGAAATGAAGCGTCAAACATTGGAT 

C AGCTG AG TG AG ATTGGCC AGG ATGTGGG AC AATT AC TTG AT AATT C C AAGG C AT C T AAG AAG j 

AT CAA CAGTG ACTC AG AGG AACTG ACT C AAAG ATGGG ATTCTTTG GTT C AG AG A C T AG AAG AT 

TCCTCCAACCAGGTGACTCAGGCTGTAGCAAAGCTGGGGATGTCTCAGATTCCTCAGAAGGAC 

CTTTTGG AG ACTGTT CGTGT AAG AG AAC AAG C AATT AC AAAAAAATCT AAGC AGG AAC TG C CT 

CCTCCTCCTCCCCCAAAGAAGAGACAGATCCATGTGGATATTGAAGCTAAGAAAAAGTTTGAT 

G CT AT AAGTGC AG AGCTGTTG AACTGG ATTTTG AAATGG AAAACTGCC ATTC AG AC C AC AG AG 

ATAAAAGAGTATATGAAGATGCAAGACACTTCCGAAATGAAAAAGAAGTTGAAGGCATTAGAA 

AAAGAACAGAGAGAAAGAATCCCCAGAGCAGATGAATTAAACCAAACTGGACAAATCCTTGTG 

GAGCAAATGGGAAAAGAAGGCCTTCCTACTGAAGAAATAAAAAATGTTCTGGAGAAGGTTTCA 

TCAGAATGGAAGAATGTATCTCAACATTTGGAAGATCTAGAAAGAAAGATTCAGCTACAGGAA 

G AT AT AAATG CTT ATTT C AAG C AG CTTG ATG AGCTTG AAAAGGT C AT C AAG A C AAAG G AGG AG 

TGGGTAAAACACACTTCCATTTCTGAATCTTCCCGGCAGTCCTTGCCAAGCTTGAAGGATTCC 

TGTCAGCGGGAATTGACAAATCTTCTTGGCCTTCACCCCAAAATTGAAATGGCTCGTGCAAGC 

TGCTCGGCCCTGATGTCTCAGCCTTCTGCCCCAGATTTTGTCCAGCGGGGCTTCGATAGCTTT 

CTGGGCCGCTACCAAGCTGTACAAGAGGCTGTAGAGGATCGTCAACAACATCTAGAGAATGAA 

CTGAAGGGCCAACCTGGACATGCATATCTGGAAACATTGAAAACACTGAAAGATGTGCTAAAT 

GATTCAGAAAATAAGGCCCAGGTGTCTCTGAATGTCCTTAATGATCTTGCCAAGGTGGAGAAG 

GCCCTGCAAGAAAAAAAGACCCTTGATGAAATCCTTGAGAATCAGAAACCTGCATTACATAAA 

CTTGCAGAAGAAACAAAGGCTCTGGAGAAAAATGTTCATCCTGATGTAGAAAAATTATATAAG 

CAAGAATTTGATGATGTGCAAGGAAAGTGGAACAAGCTAAAGGTCTTGGTTTCCAAAGATCTA 

C ATTT G CTTG AGG AAAT TGCT CT C AC ACT C AG AGCTTTT G AGGC CG ATTC AA C AGTC ATTG AG 

AAGTGGATGGATGGCGTGAAAGACTTCTTAATGAAACAGCAGGCTGCCCAAGGAGACGACGCA 

GGTCTACAGAGGCAGTTAGACCAGTGCTCTGCATTTGTTAATGAAATAGAAACAATTGAATCA 

TCTCTGAAAAACATGAAGGAAATAGAGACTAATCTTCGAAGTGGTCCAGTTGCTGGAATAAAA 

ACTTGGGTG C AG AC AAG ACT AGGTG ACT AC CAAACT C AACTG GAG AAACTT AG C AAGG AG A TC 

GCTACTCAAAAAAGTAGGTTGTCTGAAAGTCAAGAAAAAGCTGCGAACCTGAAGAAAGACTTG 

G C AG AG ATG C AGG AATGG ATG ACCC AGG CCG AGG AAG AAT AT TTGG AGCGGG ATT TTG AGT AC 

AAGTCACCAGAAGAGCTTGAGAGTGCTGTGGAAGAGATGAAGAGGGCAAAAGAGGATGTGTTG 

CAGAAGGAGGTGAGAGTGAAGATTCTCAAGGACAACATCAAGTTATTAGCTGCCAAGGTGCCC 

TCTGGTGGCCAGGAGTTGACGTCTGAGCTGAATGTTGTGCTGGAGAATTACCAACTTCTTTGT 

AATAGAATTCGAGGAAAGTGCCACACGCTAGAGGAGGTCTGGTCTTG TTGG ATTG AACTG CTT 

CACTATTTGGATCTTGAAACTACCTGGTTAAACACTTTGGAAGAGCGGATGAAGAGCACAGAG 

GTCCTGCCTGAGAAGACGGATGCTGTCAACGAAGCCCTGGAGTCTCTGGAATCTGTTCTGCGC 

CACCCGGCAGATAATCGCACCCAGATTCGAGAGCTTGGCCAGACTCTGATTGATGGGGGGATC 

CTGGATGATATAATCAGTGAGAAACTGGAGGCTTTCAACAGCCGATATGAAGATCTAAGTCAC 

CTGGCAGAGAGCAAGCAGATTTCTTTGGAAAAGCAACTCCAGGTGCTGCGGGAAACTGACCAG 

ATGCTTCAAGTCTTGCAAGAGAGCTTGGGGGAGCTGGACAAACAGCTCACCACATACCTGACT 

GACAGGATAGATGCTTTCCAAGTTCCACAGGAAGCTCAGAAAATCCAAGCAGAGATCTCAGCC 

CATGAGCTAACCCTAGAGGAGTTGAGAAGAAATATGCGTTCTCAGCCCCTGACCTCCCCAGAG 

AGTAGGACTGCCAGAGGAGGAAGTCAGATGGATGTGCTACAGAGGAAACTCCGAGAGGTGTCC 
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j ACAAAGTTCCAGCTTTTCCAGAAGCCAGCTAACTTCGAGCAGCGCATGCTGGACTGCAAGCGT 
| GTGCTGGATGGCGTGAAAGCAGAACTTCACGTTCTGGATGTGAAGGACGTAGACCCTGACGTC 
j ATACAGACGCACCTGGACAAGTGTATGAAACTGTATAAAACTTTGAGTGAAGTCAAACTTGAA 
j GTGG AAA CTG TGATTAAAAC AGG AAG AC AT ATTGTCC AG AAAC AG C AAACGG AC AAC C C AAAA 

[ GGGATGGATGAGCAGCTGACTTCCCTGAAGGTTCTTTACAATGACCTGGGCGCACAGGTGACA 
GAAGGAAAACAGGATCTGGAAAGAGCATCACAGTTGGCCCGGAAAATGAAGAAAGAGGCTGCT 
TC T CT CTCTG AATGGCTTTCTG CT ACTG AAACTG AATTG GTACAG AAGT C C ACTTCAG AAG G T 
CTG CT TGGTG ACTTGG AT AC AG AAATTT C CTGGG CT AAAAATGTT CTG AAGG ATCTGG AAAAG 
AGAAAAGCTGATTTAAATACCATCACAGAGAGTAGTGCTGCCCTGCAAAACTTGATTGAGGGC 
AGTGAGCCTATTTTAGAAGAGAGGCTCTGCGTCCTTAACGCTGGGTGGAGCCGAGTTCGTACC 
TGGACTGAAGATTGGTGCAATACCTTGATGAACCATCAGAACCAGCTAGAAATATTTGATGGG 
AACGTGG CTC AC ATAAGT ACCTG G CTT T ATC AAG C TGAAGCT CT ATTGG ATG AAATT G AAAAG 
AAACCAACAAGTAAACAGGAAGAAATTGTGAAGCGTTTAGTATCTGAGCTGGATGATGCCAAC 
CTCCAGGTTGAAAATGTCCGCGATCAAGCCCTTATTTTGATGAATGCCCGTGGAAGCTCAAGC 
AGGGAGCTTGTAGAACCAAAGTTAGCTGAGCTGAATAGGAACTTTGAAAAGGTGTCTCAACAT 
ATCAAAAGTG C C AAATTGCT AAT TG CT C AGG AAC C ATTATACC AATGTTT GGT C ACC ACTG AA 
ACATTTGAAACTGGTGTGCCTTTCTCTGACTTGGAAAAATTAGAAAATGACATAGAAAATATG 
TTAAAATTTGTGGAAAAACACTTGGAATCCAGTGATGAAGATGAAAAGATGGATGAGGAGAGT 
GCCCAGATTGAGGAAGTTCTACAAAGAGGAGAAGAAATGTTACATCAACCTATGGAAGATAAT 
AAAAAAGAAAAGATCCGTTTGCAATTATTACTTTTGCATACTAGATACAACAAAATTAAGGCA 
ATCCCTATTCAACAGAGGAAAATGGGTCAACTTGCTTCTGGAATTAGATCATCACTTGTTCCT 
ACAG ATT ATCTGGTTG AAATT AACAAAATTTTACTTTGCATGGATGATGTTGAATTATCGCTT 
AATGTT CCAGAGCTC AAC ACTG CTATTTACG AAG ACTTCTCTTTTCAGG AAG ACT CTCTG AAG 
! AATATCAAAGACCAACTGGACAAACTTGGAGAGCAGATTGCAGTCATTCATGAAAAACAGCCA 
GATGTCATCCTTGAAGCCTCTGGACCTGAAGCCATTCAGATCAGAGATACACTTACTCAGCAT 
GGCGTTGAGCTAAGACAGCAGCAGCTTGAGGACATGATTATTGACAGTCTTCAGTGGGATGAC 
CATAGGGAGGAGACTGAAGAACTGATGAGAAAATATGAGGCTCGACTCTATATTCTTCAGCAA 
GCCCGACGGGATCCACTCACCAAACAAATTTCTGATAACCAAATACTGCTTCAAGAACTGGGT 
CCTGGAGATGGTATCGTCATGGCGTTCGGATACGTCCTGCAGAAACTCTGGAGGGAATATGGG 
AGTGATGACACAAGGAATGTGAAAGAAACCACAGAGTACTTAAAAACATCATGGATCAATCTC 
AAACAAAGTATTGCTGACAGACAGAACGCCTTGGAGGCTGAGTGGAGGACGGTGCAGGCCTCT 
CGCAGAGATCTGGAAAACTTCCTGAAGTGGATCCAAGAAGCAGAGACCACAGTGAATGTGCTT 
GTGGATGCCTCTCATCGGGAGAATGCTCTTCAGGATAGTATCTTGGCCAGGGAACTCAAACAG 
CAGATGCAGGACATCCAGGCAGAAATTGATGCCCACAATGACATATTTAAAAGCATTGACGGA 
AACAGGC AG AAGATGGT AAAAG C TTTGGG AAATT C TG AAG AGG CT AC T ATG CT T C AAC AT CG A 
CTGGATGATATGAACCAAAGATGGAATGACTTAAAAGCAAAATCTGCTAGCATCAGGGCCCAT 
TTGGAGGCCAGCGCTGAGAAGTGGAACAGGTTGCTGATGTCCTTAGAAGAACTGATCAAATGG 
CTGAATATGAAAGATGAAGAGCTTAAGAAACAAATGCCTATTGGAGGAGATGTTCCAGCCTTA 
CAGCTCCAGTATGACCATTGTAAGGCCCTGAGACGGGAGTTAAAGGAGAAAGAATATTCTGTC 
I CTGAATGCTGTCGACCAGGCCCGAGTTTTCTTGGCTGATCAGCCAATTGAGGCCCCTGAAGAG 
CCAAGAAGAAACCTACAATCAAAAACAGAATTAACTCCTGAGGAGAGAGCCCAAAAGATTGCC 
AAAGCCATGCGCAAACAGTCTTCTGAAGTCAAAGAAAAATGGGAAAGTCTAAATGCTGTAACT 
AGCAATTGGCAAAAGCAAGTGGACAAGGCATTGGAGAAACTCAGAGACCTGCAGGGAGCTATG 
GATGACCTGGACGCTGACATGAAGGAGGCAGAGTCCGTGCGGAATGGCTGGAAGCCCGTGGGA 
GACTTACTCATTGACTCGCTGCAGGATCACATTGAAAAAATCATGGCATTTAGAGAAGAAATT 
GCACCAATCAACTTTAAAGTTAAAACGGTGAATGATTTATCCAGTCAGCTGTCTCCACTTGAC 
CTGCATCCCTCTCTAAAGATGTCTCGCCAGCTAGATGACCTTAATATGCGATGGAAACTTTTA 
CAGGTTTCTGTGGATGATCGCCTTAAACAGCTTCAGGAAGCCCACAGAGATTTTGGACCATCC 
TCTCAGCATTTTCTCTCTACGTCAGTCCAGCTGCCGTGGCAAAGATCCATTTCACATAATAAA 
GTGCCCTATTACATCAACCATCAAACACAGACCACCTGTTGGGACCATCCTAAAATGACCGAA 
CTCTTTCAATCCCTTGCTGACCTGAATAATGTACGTTTTTCTGCCTACCGTACAGCAATCAAA 
ATCCGAAGACTACAAAAAGCACTATGTTTGGATCTCTTAGAGTTGAGTACAACAAATGAAATT 
TTCAAACAGCACAAGTTGAACCAAAATGACCAGCTCCTCAGTGTTCCAGATGTCATCAACTGT 
CTGACAACAACTTATGATGGACTTGAGCAAATGCATAAGGACCTGGTCAACGTTCCACTCTGT 
GT TGAT ATGTGTCTC AATTGG TTG C T C AATG T CT ATG AC ACGGG T CG AACTGG AAAAATT AG A 
j GTGC AG AGTCTG AAG ATTGG ATT AATGTCTC T CT CC AAAGGTCT CTTGG AAG AAAAAT AC AG A 

| TATCTCTTTAAGGAAGTTGCGGGGCCGACAGAAATGTGTGACCAGAGGCAGCTGGGCCTGTTA 
| CTTCATGATGCCATCCAGATCCCCCGGCAGCTAGGTGAAGTAGCAGCTTTTGGAGGCAGTAAT 
; AT TG AG C CT AGTGTT CGC AGCTG CTT C C AAC AG AAT AAC AATAAAC C AG AAAT AAG TGTG AAA 

; GAGTTTATAGATTGGATGCATTTGGAACCACAGTCCATGGTTTGGCTCCCAGTTTTACATCGA 
GTGGCAGCAGCGGAGACTGCAAAACATCAGGCCAAATGCAACATCTGTAAAGAATGTCCAATT 
GTCGGGTTCAGGTATAGAAGCCTTAAGCATTTTAACTATGATGTCTGCCAGAGTTGTTTCTTT 
TCGGGTCGAACAGCAAAAGGTCACAAATTACATTACCCAATGGTGGAATATTGTATACCTACA 
AC ATCTGGGG AAG ATGTACG AG ACTT C AC AAAGGT AC T T AAG AAC AAGTT C AG G T CG AAG AAG 
TACTTTGCCAAACACCCTCGACTTGGTTACCTGCCTGTCCAGACAGTTCTTGAAGGTGACAAC 
TTAGAGACTCCTATCACACTCATCAGTATGTGGCCAGAGCACTATGACCCCTCACAATCTCCT 
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CAACTGTTTCATGATGACACCCATTCAAGAATAGAACAATATGCCACACGACTGGCCCAGATG 
GAAAGGACTAATGGGTCTTTTCTCACTGATAGCAGCTCCACCACAGGAAGTGTGGAAGACGAG 
CACGCCCTCATCCAGCAGTATTGCCAAACACTCGGAGGAGAGTCCCCAGTGAGCCAGCCGCAG 
AGCCCAGCTCAGATCCTGAAGTCAGTAGAGAGGGAAGAACGTGGAGAACTGGAGAGGATCATT 
GCTGACCTGGAGGAAGAACAAAGAAATCTACAGGTGGAGTATGAGCAGCTGAAGGACCAGCAC 
CTCCGAAGGGGGCTCCCTGTCGGTTCACCGCCAGAGTCGATTATATCTCCCCATCACACGTCT 
GAGGATTCAGAACTTATAGCAGAAGCAAAACTCCTCAGGCAGCACAAAGGTCGGCTGGAGGCT 
AGGATGCAGATTTTAGAAGATCACAATAAACAGCTGGAGTCTCAGCTCCACCGCCTCCGACAG 
CTGCTGGAGCAGCCTGAATCTGATTCCCGAATCAATGGTGTTTCCCCATGGGCTTCTCCTCAG 
CATTCTGCACTGAGCTACTCGCTTGATCCAGATGCCTCCGGCCCACAGTTCCACCAGGCAGCG 
GGAGAGGACCTGCTGGCCCCACCGCACGACACCAGCACGGATCTCACGGAGGTCATGGAGCAG 
ATTCAC AGC ACGTTT CC ATC T TG CTG CC C AAATG TTCCC AGC AGG CC AC AGGC AA TG T G A 




ORF Start: ATG at 1 j jORF Stop: TGA at 9 1 93 




SEQ IDNO: 158 j3064 aa |MW at 352283.8kD 


NOV39a, 

CGI 22 195-01 Protein 
Sequence 

i 

i 

! 

1 

i 

i 

i 

| 

i 
i 


MAKYGEHEASPDNGQNEFSDI I KSRSDEHNDVQKKTFTKWINARFSKSGKPPINDMFTDLKDG 

RKLLDLLEGLTGTSLPKERGSTRVHALNNVNRVLQVLHQNNVELVNIGGTDIVDGNHKLTLGL 

LWSIILHWQVKDVMKDVMSDLQQTNSEKILLSWVRQTTRPYSQVNVLNFTTSWTDGLAFNAVL 

HRHKPDLFSWDKWKMSPIERLEHAFSKAQTYLGIEKLLDPEDVAVRLPDKKSIIMYLTSLFE 

VLPQQVTIDAIREVETLPRKYKKECEEEAINIQSTAPEEEHESPRAETPSTVTEVDMDLDSYQ 

IALEEVLTWLLSAEDTFQEQDDISDDVEEVKDQFATHEAFMMELTAHQSSVGSVLQAGNQLIT 

QGTLSDEEEFEIQEQMTLLNARWEALRVESMDRQSRLHDVLMELQKKQLQQLSAWLTLTEERI 

QKMETCPLDDDVKSLQKLLEEHKSLQSDLEAEQVKVNSLTHMWIVDENSGESATAILEDQLQ 

KLGERWTAVCRWTEERWNRLQEINILWQELLEEQCLLKAWLTEKEEALNKVQTSNFKDQKELS 

VSVRRLAILKEDMEMKRQTLDQLSEIGQDVGQLLDNSKASKKINSDSEELTQRWDSLVQRLED 

SSNQVTQAVAKLGMSQIPQKDLLETVRVREQAITKKSKQELPPPPPPKKRQIHVDIEAKKKFD 

AISAELLNWILKWKTAIQTTEIKEYMKMQDTSEMKKKLKALEKEQRERIPRADELNQTGQILV 

EQMGKEGLPTEEIKNVLEKVSSEWKNVSQHLEDLERKIQLQEDINAYFKQLDELEKVIKTKEE 

WVKHTSISESSRQSLPSLKDSCQRELTNLLGLHPKIEMARASCSALMSQPSAPDFVQRGFDSF 

LGRYQAVQEAVEDRQQHLENELKGQPGHAYLETLKTLKDVLNDSENKAQVSLNVLKDLAKVEK 

ALQEKKTLDEILENQKPALHKLAEETKALEKNVHPDVEKLYKQEFDDVQGKWNKLKVLVSKDL 

HLLEEIALTLRAFEADSTVIEKWMDGVKDFLMKQQAAQGDDAGLQRQLDQCSAFVNEIETIES 

SLKNMKEIETNLRSGPVAGIKTWVQTRLGDYQTQLEKLSKEIATQKSRLSESQEKAANLKKDL 

AEMQEWMTQAEEEYLERDFEYKSPEELESAVEEMKRAKEDVLQKEVRVKILKDNIKLLAAKVP 

SGGQELTSELNWLENYQLLCNRIRGKCHTLEEVWSCWIELLHYLDLETTWLNTLEERMKSTE 

VLPEKTDAVNEALESLESVLRHPADNRTQIRELGQTLIDGGILDDIISEKLEAFNSRYEDLSH 

LAESKQISLEKQLQVLRETDQMLQVLQESLGELDKQLTTYLTDRIDAFQVPQEAQKIQAEISA 

HELTLEELRRNMRSQPLTSPESRTARGGSQMDVLQRKLREVSTKFQLFQKPANFEQRMLDCKR 

VLDGVKAELHVLDVKDVDPDVIQTHLDKCMKLYKTLSEVKLEVETVIKTGRHI VQKQQTDNPK 

GMDEQLTSLKVLYNDLGAQVTEGKQDLERASQLARKMKKEAASLSEWLSATETELVQKSTSEG 

LLGDLDTEISWAKNVLKDLEKRKADLNTITESSAALQNLIEGSEPILEERLCVLNAGWSRVRT 

WTEDWCNTLMNHQNQLEI^FDGNVAHISTWLYQAEALLDEIEKKPTSKQEEIVKRLVSELDDAN 

LQVENVRDQALILMNARGSSSRELVEPKLAELNRNFEKVSQHIKSAKLLIAQEPLYQCLVTTE 

TFETGVPFSDLEKLENDIENMLKFVEKHLESSDEDEKMDEESAQIEEVLQRGEEMLHQPMEDN 

KKEKIRLQLLLLHTRYNKIKAIPIQQRKMGQLASGIRSSLLPTDYLVEINKILLCMDDVELSL 

NVPELNTAIYEDFSFQEDSLKNIKDQLDKLGEQIAVIHEKQPDVILEASGPEAIQIRDTLTQH 

GVELRQQQLEDMIIDSLQWDDHREETEELMRKYEARLYILQQARRDPLTKQISDNQILLQELG 

PGDGIVMAFGYVLQKLWREYGSDDTRNVKETTEYLKTSWINLKQSIADRQNALEAEWRTVQAS 

RRDLENFLKWIQEAETTVNVLVDASHRENALQDSILARELKQQMQDIQAEIDAHNDIFKSIDG 

NRQKMVKALGNSEEATMLQHRLDDMNQRWNDLKAKSASIRAHLEASAEKWNRLLMSLEELIKW 

LNMKDEELKKQMPIGGDVPALQLQYDHCKALRRELKEKEYSVLNAVDQARVFLADQPIEAPEE 

PRRNLQSKTELTPEERAQKIAKAMRKQSSEVKEKWESLNAVTSNWQKQVDKALEKLRDLQGAM 

DDLDADMKEAESVRNGWKPVGDLLIDSLQDHIEKIMAFREEIAPINFKVKTVNDLSSQLSPLD 

LHPSLKMSRQLDDLNMRWKLLQVSVDDRLKQLQEAHRDFGPSSQHFLSTSVQLPWQRSISHNK 

VPYYINHQTQTTCWDHPKMTELFQSLADLNNVRFSAYRTAIKIRRLQKALCLDLLELSTTNEI 

FKQHKLNQNDQLLSVPDVINCLTTTYDGLEQMHKDLVNVPLCVDMCLNWLLNVYDTGRTGKIR 

VQSLKIGLMSLSKGLLEEKYRYLFKEVAGPTEMCDQRQLGLLLHDAIQIPRQLGEVAAFGGSN 

IEPSVRSCFQQNNNKPEISVKEFIDWMHLEPQSMVWLPVLHRVAAAETAKHQAKCNICKECPI 

VGFRYRSLKHFNYDVCQSCFFSGRTAKGHKLHYPMVEYCIPTTSGEDVRDFTKVLKNKFRSKK 

YFAKHPRLGYLPVQTVLEGDNLETPITLISMWPEHYDPSQS PQLFHDDTHSRIEQYATRLAQM 

ERTNGSFLTDSSSTTGSVEDEHALIQQYCQTLGGESPVSQPQSPAQILKSVEREERGELERII 

ADLEEEQRNLQVEYEQLKDQHLRRGLPVGSPPESIISPHHTSEDSELIAEAKLLRQHKGRLEA 

RMQILEDHNKQLESQLHRLRQLLEQPESDSRINGVSPWASPQHSALSYSLDPDASGPQFHQAA 

GEDLLAPPHDTSTDLTEVMEQIHSTFPSCCPNVPSRPQAM 
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Further analysis of the NOV39a protein yielded the following properties shown in 
Table 39B. 



Table 39B. Protein Sequence Properties NOV39a 


PSort 

analysis: 


0.9600 probability located in nucleus: 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



5 A search of the NOV39a protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 39C. 



Table 39C Geneseq Results for NOV39a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date] 


NOV39a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Region 


Expect 
Value 


AAW220I7 


Utrophin - Homo sapiens, 3433 aa. 
[ W09722696-A 1 , 26-JUN-l 997] 


1 ..2429 
1 ..2462 


1995/2509 (79%) 
2115/2509 (83%) 


0.0 


AAB67964 


Amino acid sequence of utrophin B 
isoform minigene - Unidentified, 
2013 aa. [WO200 125461 -A 1, 12- 
APR-2001] 


1492.. 3064 
371..2013 


1215/1646 (73%) 
1322/1646 (79%) 


0.0 


AAW22016 


Utrophin truncated polypeptide - 
Synthetic, 2008 aa. [W09722696- 
Al, 26-JUN-l 997] 


1492..3064 
366..2008 


1214/1646(73%) 
1322/1646(79%) j 


0.0 


AAP90373 


Sequence encoded by human 
muscular dystrophy (MD) cDNA - 
Homo sapiens, 3685 aa. 
[WO8906286-A, I3-JUL-I989] 


28..2435 
1 2..2429 


974/2498 (38%) 
1497/2498 (58%) 


0.0 


r - * 
AAP90290 


Human Duchenne muscular 
dystrophy gene - Homo sapiens, 
3685 aa. [EP33I5I4-A, 06-SEP- 
1989] 


28..2435 
I2..2429 


962/2498 (38%) 
1485/2498 (58%) 


0.0 



10 In a BLAST search of public sequence datbases, the NOV39a protein was found to 

have homology to the proteins shown in the BLASTP data in Table 39D. 
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Tabic 39D. Public BLAST P Results for NOV39a 



Protein 

Accession 

Number 



Protein/Organism/Length 



NOV39a 
Residues/ 
Match 
Residues 



Identities/ 
Similarities for the 
Matched Portion 



Expect | 
Value 



P46939 



Utrophin (Dystroph in-related 
protein I) (DRPI) (DRP) - Homo 
sapiens (Human), 3433 aa. 



1..2429 
1..2462 



2055/2509 (81%) 
2152/2509 (84%) 



0.0 



CAA03735 



SEQUENCE 9 FROM PATENT 
W09722696 - unidentified, 3433 
aa. 



1..2429 
1..2462 



1996/2509 (79%) 
2116/2509 (83%) 



0.0 



008614 



Cytoskeletal protein - Mus 
musculus (Mouse), 3429 aa. 



1..2556 
1..2540 



1795/2616(68%) 
2028/2616(76%) 



0.0 



055147 



Utrophin - Rattus norvegicus (Rat), 
3419 aa. 



1..2243 
I. .2266 



1728/2297 (75%) 
1911/2297(82%) 



0.0 



CAC37761 



Sequence 8 from Patent 

WOO 12546 1 - synthetic construct, 

2013 aa. 



1492..3064 
371. .2013 



1215/1646 (73%) 
j 1322/1646(79%) 



0.0 



PFam analysis predicts that the NOV39a protein contains the domains shown in the 
Table 39E. 



Table 39E. Domain Analysis of NOV39a 


Pfam Domain 


NOV39a Match Region 


i Identities/ 
Similarities 

for the Matched Region 


Expect Value 


CH 


3I..I35 


45/125 (36%) 
91/125 (73%) 


6.1e-35 


CH 


150..255 


32/125(26%) 
92/125 (74%) 


1.9e-35 


spectrin 


309..417 


32/110(29%) 
77/110(70%) 


3.3e-17 


spectrin 


418..526 


26/1 1 1 (23%) 
82/1 1 1 (74%) 


8.le-l4 


spectrin 


541. .637 


20/99 (20%) 
65/99 (66%) 


0.042 


spectrin 


687..798 


28/113(25%) 
72/113(64%) 


0.67 


spectrin 


803..902 


18/104(17%) 
68/104 (65%) 


0.0039 
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spectrin 


1016.1083 


16/69 (23%) 
45/69 (65%) 


0.00029 


spectrin 


1125..I230 


19/109(17%) 
73/109 (67%) 


2.3e-IO 


DUF164 


1069.. 1278 


40/243 (16%) 
111/243 (46%) 


0.85 


spectrin 


1248.. 1334 


29/93 (31%) 
52/93 (56%) 


0.0006 


spectrin 


1544.. 1649 


18/109(17%) 
72/109 (66%) 


1.4e-05 ! 


spectrin 


1652..1753 


20/108(19%) 
69/108 (64%) 


0.065 


spectrin 


2037..207I • 


12/37 (32%) 
27/37 (73%) 


0.0066 


spectrin 


2074..2187 


27/115 (23%) 
82/115(71%) 


l.8e-09 


spectrin 


2 190. .2267 


15/81 (19%) 
56/81 (69%) 


0.008 


spectrin 


2322..2428 


27/110(25%) 
71/110(65%) 


4.8e-05 


WW 


2445..2474 


10/30 (33%) 
24/30 (80%) 


S.le-08 


ZZ 


2695..2740 


21/47(45%) 
45/47 (96%) 


7.5e-23 



Example 40. 

The NOV40 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 40A. 

5 



Table 40A. NOV40 Sequence Analysis 




SEQ ID NO: 159 J 1677 bp 




NOV40a, 

CGI 22738-01 DNA 
Sequence 


GCCGCACCGGCAGCCATGAGCTCGGAGATGGAGCCGCTGCTCCTGGCCTGGAGCTATTTTAGG 
CGCAGGAAGTTCCAGCTCTGCGCCGATCTATGCACGCAGATGCTGGAGAAGTCCCCTTATGAC 
CAGGCAGCTTGGATCTTAAAAGCAAGAGCGCTAACAGAAATGGTATACATAGATGAAATTGAT 
GTAGATCAGGAAGGAATTGCAGAAATGATGCTGGATGAAAATGCTATAGCTCAAGTTCCACGT 
CCTGGAACGTCTTTGAAACTCCCTGGAACTAATCAGACAGGAGGGCCTAGCCAGGCCGTTAGG 
CCAATCACACAAGCTGGAAGACCCATTACAGGTTTCCTCAGGCCCAGCACGCAGAGTGGAAGG 
CCAGGCACTATGGAACAGGCTATCAGAACACCCAGAACCGCCTACACAGCCCGCCCTATCACC 
AG CTCCT C CGG AAG ATTTGTC AGGCTGGG AACGG CT TCC ATG CTT ACAAGTCCTG ATG G AC C A 
TTTATAAATTTATCTAGGCTGAATTTAACAAAGTATTCCCAGAAACCTAAGTTGGCAAAGGCT 
TTGTTTG AGT AT ATCTTTCATC ATG AAAATG ATG TT AAGGCTTTGG ATCTGG CTG CCC TCT CC 
ACAGAACATTCTCAGTACAAGGACTGGTGGTGGAAAGTACAGATTGGAAAATGTTACTACAGG 
TTGGG AATGT ATCG TG AAG CAG AAAAAC AGTTT AAATC AGCC CTG AAG C AG C AGG AAATG G T A 
GATACATTTCTGTACTTGGCAAAAGTATATGTCTCATTGGATCAACCTGTGACTGCTTTAAAT 
CTTTTCAAACAAGGCTTAGATAAGTTTCCAGGAGAAGTAACCCTGCTCTGTGGAATTGCAAGA 
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ATCTATGAGGAAATGAACAATATGTCATCAGCAGCAGAATATTACAAAGAAGTTTTGAAACAA 
GACAATACTCATGTGGAAGCCATCGCATGCATTGGAAGCAACCACTTCTATTCTGATCAGCCA 
GAAATAGCTCTCCGGTTTTACAGGCGGCTGCTGCAGATGGGCATTTATAACGGCCAGCTTTTT 
AACAATCTGGGG C TG TGTTG CTT CT ATG CCC AG CAG T ATG AT ATG ACTCTG AC CT C AT T TG AA 
CGTGCCCTTTC TTTG G CTG AAAATG AAG AAG AGGC AG CTG ATGTCTGGT AC AAC TTGG G AC AT 
GTAGCTGTGGGAATAGGAGATACAAATTTGGCCCATCAGTGCTTCAGGCTGGCTCTGGTCAAC 
AACAACAACCACGCCGAGGCCTACAACAACCTGGCTGTGCTGGAGATGCGGAAGGGCCACGTT 
GAACAGGCAAGGGCACTATTACAAACTGCATCATCATTAGCACCCCATATGTATGAACCGCAT 
TTTAATTTTG C AACAATCTCTG ATAAG ATTGG AG AT CTGC AG AGAAGCTATGT TG CTG CG C AG 
AAGTCTGAAGCAGCATTTCCAGACCATGTGGACACACAACATTTAATTAAACAATTAAGGCAG 
C ATTTTG C TATGCTC TG ATTGTT CCTT AG AC CAC AT ATGTTCTT ATG AAG CAG C ATT ATGC AA 


GGGGAAAAAAGCACTATGTCTGTGTATGTATGTATATAGTGTAATACGTATATTTTAACAAAC 


CTGTCCTTGATATTAGTTAAGGTGACACATAAGGGTGAC 




ORF Start: ATG at 16 | 


ORF Stop: TGA at 1 528 




SEQIDNO: 160 


504 aa 


MWat57182.6kD 


NOV40a, 

CGI 22738-01 Protein 
Sequence 


MSSEMEPLLLAWSYFRRRKFQLCADLCTQMLEKSPYDQAAWILKARALTEMVYIDEIDVDQEG 
IAEMMLDENAIAQVPRPGTSLKLPGTNQTGGPSQAVRPITQAGRPITGFLRPSTQSGRPGTME 
QAIRTPRTAYTARPI TS S SGRFVRLGTASMLTS PDGP FINLSRLNLTKYSQKPKL AKALFE Y I 
FHHENDVKALDLAALSTEHSQYKDWWWKVQIGKCYYRLGMYREAEKQFKSALKQQEMVDTFLY 
L AK V Y VS LDQ P VTALNL FKQGLDKFPGE VTL LCG I AR I YEEMNNM S S AAE Y Y K E V L KQDNT HV 
EAIACIGSNHFYSDQPEIALRFYRRLLQMGIYNGQLFNNLGLCCFYAQQYDMTLTSFERALSL 
AENEEEAADVWYNLGHVAVGIGDTNLAHQCFRLALVNNNNHAEAYNNLAV 
LLQTASSLAPHMYEPHFNFATISDKIGDLQRSYVAAQKSEAAFPDHVDTQHLIKQLRQHFAML 



Further analysis of the NOV40a protein yielded the following properties shown in 
Table 40B. 



Table 40B. Protein Sequence Properties NOV40a 


PSort 
analysis: 


0.5944 probability located in mitochondrial matrix space; 0.3651 probability located 
in microbody (peroxisome); 0.3022 probability located in mitochondria! inner 
membrane; 0.3022 probability located in mitochondrial intermembrane space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



5 



A search of the NOV40a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 40C. 



Table 40C. Geneseq Results for NOV40a 


Geneseq 
Identifier 


Protein/Organism/Length (Patent #, 
Date] 


NOV40a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


ABG 18795 \ 


Novel human diagnostic protein 
#18786 - Homo sapiens, 550 aa. 
(WO200175067-A2, 1 1-OCT-200I] 


226..504 
270..550 


254/28 1(90%) 
262/281 (92%) 


e-142 


AAM41765 


Human polypeptide SEQ ID NO 6696 


1..240 
20..260 


234/241 (97%) 
236/241 (97%) 


e-133 
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, ABG 18794 


Novel human diagnostic protein 
#18785 - Homo sapiens, 207 aa. 
[WO200175067-A2, J1-OCT-2001] 


38..243 
1 ..207 


191/207 (92%) 
197/207 (94%) 


e-105 


IAAB53386 


Human colon cancer antigen protein 
sequence SEQ IDNO:926 - Homo 
sapiens, 220 aa. [WO20005535 1 -A 1 , 
21-SEP-2000] 


339..504 
55. .220 


166/166(100%) 
166/166(100%) 


2e-93 


; ABG 18793 

1 
! 


Novel human diagnostic protein 
#18784 - Homo sapiens, 142 aa. 
[ WO200 1 75067-A2, 1 1 -OCT-200 1 ] 


14. .154 
2.. 142 


141/141 (100%) 
141/141 (100%) 


le-76 



In a BLAST search of public sequence datbases, the NOV40a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 40D. 



; Table 40D. Public BLASTP Results for NOV40a 


! Protein 
! Accession 
.; Number 

i 


Protein/Organism/Length 


NOV40a 
! Residues/ 
] Match 

Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


; Q8TAM2 


Similar to putative - Homo sapiens 
(Human), 531 aa. 


1..504 
1..531 


504/531 (94%) 
504/53 1 (94%) 


0.0 


I Q9DCP7 


06l0012F22Rik protein - Mus 
musculus (Mouse), 505 aa. 


1 ..504 
1..505 


484/505 (95%) 
497/505 (97%) 


0.0 


; : Q8VD72 


Similar to RJKEN cDNA 
06I0012F22 gene - Mus musculus 
(Mouse), 515 aa. 


I..504 
I..5I5 


484/515 (93%) 
497/515 (95%) 


0.0 


CAD38757 


Hypothetical protein - Homo sapiens 

(Human), 481 aa (fragment). 

. , . , I,, 


I..504 
7..48I 


474/505 (93%) 
474/505 (93%) 


0.0 


Q96DG8 

i 


Similar to RJKEN cDNA j 
0610012F22 gene -Homo sapiens i 
(Human), 353 aa (fragment). 


153..504 
I..353 


352/353 (99%) 
352/353 (99%) 


0.0 



5 



PFam analysis predicts that the NOV40a protein contains the domains shown in the 
Table 40E. 
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Table 40E. Domain Analysis of NOV40a 


J Id 111 LrUlIlulll 


11 WiUrt lrltlli.ll I\Cl±IUII 


Identities/ 

Qi m i In r\ tine 

for the Matched Region 


r-\peci value 


TPR 


2I4..247 


9/34 (26%) 
23/34 (68%) 


0.92 


TPR 


281 .314 


6/34(18%) 
24/34 (71%) 


0.019 


TPR 


349..382 


10/34(29%) 
24/34 (71%) 


0.0013 


TPR 


386..419 


10/34(29%) 
23/34(68%) 


0.021 


TPR 


420..453 


1 1/34 (32%) 
25/34 (74%) 


0.015 



Example 41. 

The NOV4I clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 41 A. 

5 



Table 41A. NOV41 Sequence Analysis 




SEQ ID NO: 161 


3397 bp 




NOV4la, 

CGI 2345 1-01 DNA 
Sequence 


GCGGGACGCGCAGCGCTATGGCAGAGGGCAGCGGGGAAGTGGTCGCAGTGTCTGCGACCGGGG 
CTGCCAACGGCCTCAACAATGGGGCAGGCGGGACCTCGGCGACGACCTGCAACCCGCTGTCGC 
GC AAG CTG C AT AAG ATCC TGG AG ACG CGG CTGG AC AA CG AC AAGG AG ATG TT A G AAG C T CT C A 
AGGCACTTTCAACCTTTTTTGTTGAAAATAGTCTGCGGACTCGAAGAAATTTACGTGGAGATA 
TTGAACGTAAAAGTTTAGCCATCAATGAAGAATTTGTAAGCATTTTCAAGGAAGTGAAGGAGG 
AACTTGAAAGCATAAGCGAAGATGTTCAAGCAATGAGCAACTGTTGTCAAGATATGACAAGTC 
GCCTACAGGCAGCAAAGGAACAGACTCAAGATTTAATAGTTAAAACCACTAAGCTTCAATCTG 
AAAG C CAAAAATT AG AG AT AAG AG CTC AAGTTG C AG ATG C CT TCT T AT CC AAG TT CC AACTG A 
CTTCTGATGAAATGAGTCTTCTTCGAGGTACAAGAGAAGGACCCATTACTGAGGATTTTTTCA 
AGGCACTGGGAAGAGTAAAACAGATTCATAATGATGTCAAAGTTCTCTTGCGTACAAATCAAC 
AAACGGCAGGTTTAGAAATTATGGAACAGATGGCCTTACTTCAAGAAACGGCTTATGAAAGAC 
TTT AC CG ATGGG CTC AAAGTG AATGCAG AAC ATT G AC AC AAG AAT C ATGTG AC GT A T C T C C AG 
TATTGACACAGGCAATGGAAGCCCTGCAGGACAGACCTGTCTTATATAAATATACCTTAGATG 
AATTTGGAACAGCCAGAAGAAGTACAGTTGTTCGTGGATTTATTGATGCGCTCACAAGAGGGG 
GCCCCGGAGGTACACCTAGACCAATTGAAATGCATTCTCATGACCCTTTGAGGTATGTAGGAG 
AT ATGTTGG C TTGG CTC CATC AAG CTACTG CTTC TG AAAAGG AAC ACCTTG AAGC TCT CT T AA 
AG C ATGT AAC TAC AC AAGGTGTTG AAG AAAAT AT T C AAG AAG T TG TTGGG C AT AT C AC TG AAG 
GTGTGTGCAGGCCTCTAAAGGTTCGAATTGAGCAAGTAATAGTTGCTGAACCTGGGGCAGTTT 
TATTATATAAAATTTCTAATCTCCTCAAATTTTATCACCATACAATCAGTGGTATTGTTGGAA 
ATAGTGCAACTGCATTATTGACTACCATTGAAGAAATGCATTTGCTAAGCAAAAAAATATTCT 
TCAATAGCTTGAGTCTTCATGCAAGTAAATTAATGGACAAGGTTGAACTCCCACCACCTGATC 
TTGGACCAAGTTCTGCACTAAATCAGACACTCATGTTGCTGCGTGAAGTTTTAGCATCTCACG 
ATTCTTCAGTTGTACCATTAGATGCTCGTCAAGCTGATTTTGTGCAGGTTTTATCATGTGTCT 
TGGATCCTCTCCTACAGATGTGTACTGTATCAGCCAGCAATTTAGGCACAGCTGACATGGCCA 
CTTTCATGGTCAATTCACTATATATGATGAAGACAACATTAGCTCTATTTGAATTCACTGACA 
GACGTCTGGAAATGCTACAGTTTCAGATCGAAGCACATTTGGACACACTTATAAATGAGCAAG 
CCTCTTATGTTTTAACTAGGGTAGGCTTGAGTTACATCTATAACACTGTACAGCAACATAAAC 
CTGAAC AGGG CTCTTT AGCT AAT ATGCCC AACCT AG ATTCTG TG ACACTG AAG G CTG C AATGG 
TTCAGTTTGATCGTTATCTGTCAGCCCCAGACAACCTATTGATACCACAGCTGAACTTTCTTC 
TAAGTGCCACAGTGAAAGAGCAGATCGTAAAACAATCTACAGAATTAGTCTGCAGAGCCTATG 
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1 

] 

! 

i 

i 


GTG AAGTGT ATGC AG CCGTG ATG AAT CC AAT C AATG AAT AC AAAG AT C C AG AG AAC AT T C T T C 
ACCGATCGCCGCAGCAAGTGCAGACGCTTCTTTCCTGATTATCTTATTTCATTGTGTTAGCAA 
AATATGACCT CCC T AAAAC AC TG AAGGTT ATTTTTT ATTCTTTG AATTTTT ACTT T AT AATTT 


GATAGTTACAGTTTTCTTTGTATCATAAGATTGTAAGTCCCGATAATTTTTTTTTTTTTGGTC 


TCAGTAACAGGGAAGTAAGTAACATGTTGACCTGAGCTAGTATTGCTGTGTATCTACTCTAAA 


TGAGATGATCTATTTTTTTGCTAGCCATCTCTCCAGCTCTGCAGTTTTCACTGTATTCAGGAA 


GCATAAAGTAGTATGAAAGGTTTGAAGAATTTTTTTTTACAAGACTAGTTCTAAATTAACAGC 


TTATAAAAAATTTGTCTAAATTTAATAATTAGTATAAGGATATGACCTAATAAATGTCTCCTT 


! 

I 

! 

! 


ACCTAAAGATTCATTTGCTTTCTTTT AAT ATG AGTAGGCAT ACTT AGTAGCTTTTCTG AAC CT 


AGCCTATGTCTCTGTCCCCAAAATAGCTGCCCTTAAAGAGTTGTTAGCAGAGAGAAAAATAAC 


AGTGAATGTGCTCCTGGTGTATATGGCAGTGAATCTCCTTTCTGTTCTACTTTAGCATACTAT 


AT AT ATTTG ACTGTGTAC ATT CTT ATGC AATTT T AAG T AT AC ACT C AG C AAT AAT T AG AAAAA 


AAGG AGAG AGAAAAGTG ATTT AAAC AGGG TGG AT TCC ACTC T GTGGG AGC C TT CG ATG G AAC T 


CAAGGTGGAGCTCAGCCTTTCCAATGAGCTCTAAGCATGTAGATAGCCTGAGCTGTGTCTAAG 


CCTGGTGTTTAAAGATGGGTATTTGTCATACAATATGGGTCCTAAATCCAACCAACTACACAT 


TTTATCTGGTGTTCAAACCAAAGAAACAATGATCTACTCAAACATTGGAGAAAAAAACTGCCA 


GAGGAGGAGTTGCCAATTGGCAGTGTGTCTTATCTCCATGTTGTAACTGGACTCTGACTTTAG 


ACCATTACCTATTAGGAAGATTAAAAATGACTGTATTTTTAAAGGAATAAATCCCAGTGTGCC 


TGATTTGACATTCTTGTCAGCAAAAAAAACTTAATTTCTAGTAAATCTATAAAAATGGGTAAG 


i 

i 
i 

: 


TCCCTAAATTACAAATGAGAAAATTGAAGCACAAGGAAAAAAATAACTAGTTTGAAATATTTT 


GAAAAGTAATAACATAAAACTAGTATTTGTAGAAGATTATGTGTTGTATATAACAAATTAGTA 


T TTAT AG AATATGACCT ATTT AT CTG AAG TT TAT AATTG TTT AT AC CT AAT AC AGT TC TTTTT 


GGAGTAAGAATGATTATATAATCGTTATCCATTTGGGTATAAATCTGTATTTTTAGTTTTTTC 


CCTTTGATTAGTATGTGTTACATATAAAGACAGAAAATAAAGTATAAATCAAGAGCTT 


i 


ORF Start: ATG at 1 8 j ]ORF Stop: TGA at 1 989 


r 


SEQ ID NO: 162 657 aa MW at 73278.2RD 


:NOV41a ? 

CGI 2345 1-01 Protein 
Sequence 


MAEGSGEWAVSATGAANGLNNGAGGTSATTCNPLSRKLHKILETRLDNDKEMLEALKALSTF 
FVENSLRTRRNLRGDIERKSLAINEEFVSIFKEVKEELESISEDVQAMSNCCQDMTSRLQAAK 
EQTQDLIVKTTKLQSESQKLEIRAQVADAFLSKFQLTSDEMSLLRGTREGPITEDFFKALGRV 
KQIHNDVKVLLRTNQQTAGLEIMEQMALLQETAYERLYRWAQSECRTLTQESCDVSPVLTQAM 
EALQDRPVLYKYTLDEFGTARRSTWRGFIDALTRGGPGGTPRPIEMHSHDPLRYVGDMLAWL 
HQATASEKEHLEALLKHVTTQGVEENIQEWGHITEGVCRPLKVRIEQVIVAEPGAVLLYKIS 
NLLKFYHHTISGIVGNSATALLTTIEEMHLLSKKIFFNSLSLHASKLMDKVELPPPDLGPSSA 
LNQTLMLLREVLASHDSSWPLDARQADFVQVLSCVLDPLLQMCTVSASNLGTADMATFMVNS 
LYMMKTTLALFEFTDRRLEMLQFQIEAHLDTLINEQASYVLTRVGLSYIYNTVQQHKPEQGSL 
ANMPNLDS VTLKAAMVQFDRYLSAPDNLL I PQLNFLLS ATVKEQI VKQSTELVCRAYGEVYAA 
VMNPINEYKDPENILHRSPQQVQTLLS 



Further analysis of the NOV41a protein yielded the following properties shown in 
Table 4 IB. 



Table 41B« Protein Sequence Properties NOV41a 


PSort 
analysis: 


0.5500 probability located in endoplasmic reticulum (membrane); 0.1900 probability 
located in lysosome (lumen); 0.1000 probability located in endoplasmic reticulum 
(lumen); 0.1000 probability located in outside 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



5 



A search of the NOV41a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 4 1C. 
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. Tabic 41C Geneseq Results for NOV41a 



! 

j 

: Geneseq 
j Identifier 


Protein/Organism/Length [Patent #, 
Date) 


NOV4ia 
Residues/ 
Match 
Residues 


J Identities/ 

; Similarities for 

the Matched 

Region 


Expect 
Value 


j AAM4I398 


Human polypeptide SEQ ID NO 6329 - 
Homo sapiens, 670 aa. [WO2001533 12- 
A1,26-JUL-200I] 


1..657 
14..670 


655/657 (99%) 
655/657 (99%) 


0.0 


|AAM396I2 

! 

i 


Human polypeptide SEQ ID NO 2757 - 
Homo sapiens, 656 aa. [WO200153312- 
AI,26-JUL-2001] 


1..657 
1..656 


653/657 (99%) 
653/657 (99%) 


0.0 


i ABB58712 

i 
i 


Drosoohila melanoeaster Dolvoeotide 
SEQ ID NO 2928 - Drosophila 
melanogaster, 630 aa. [WO20017I042- 
A2, 27-SEP-2001] 


35. .657 
14..628 


262/612 (41%^ 
41 1/632(64%) 




|ABP04367 


Human ORFX protein sequence SEQ ID 
NO:8716 - Homo sapiens, 122 aa. 
[ WO200 1 92523-A2, 06-DEC-200 1 ] 


516..636 
1..I21 


109/121 (90%) 
118/121 (97%) 


2e-57 

. , o 


AAB41840 


Human ORFX ORF1604 polypeptide 
sequence SEQ ID NO:3208 - Homo 
sapiens, 107 aa. [WO200058473-A2, 
05-OCT-2000] 


53..159 
I ..107 


106/107 (99%) 
106/107 (99%) 


4e-51 



In a BLAST search of public sequence datbases, the NOV41a protein was found to - 
have homology to the proteins shown in the BLASTP data in Table 4 1 D. 



iTnble 41 D. Public BLASTP Results for NOV41a 



j 

Protein 

Accession 

Number 


Protein/Organism/Length 


NOV41a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


Q9Y2V7 

i — , 


Hypothetical 68.1 kDa protein - 
Homo sapiens (Human), 605 aa. 


53..657 
1..605 


605/605 (100%) 'iO.O 
605/605 (100%) | 


! Q9ULT5 

i 


K1AA1 134 protein - Homo sapiens 
(Human), 61 1 aa (fragment). 


5.. 609 
1..605 


605/605 (100%) 
605/605 (100%) 


0.0 


Q8R3I3 


Similar to KIAA1 134 protein • Mus 
musculus (Mouse), 605 aa. 


53..6S7 
I. .605 


559/605 (92%) 
587/605 (96%) 


0.0 


Q9V564 


CGI 968 protein - Drosophila 
melanogaster (Fruit fly), 630 aa. 


35..6S7 
14..628 


262/632(41%) 
411/632 (64%) 


c-147 


Q9C6R8 


Hypothetical 78.8 kDa protein - 
Arabidopsis thaliana (Mouse-ear 
cress), 706 aa. 


35..656 
11. .704 


236/707 (33%) 
375/707 (52%) 


e-103 



5 
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PFam analysis predicts that the N0V4 la protein contains the domains shown in the 
Table 4 IE. 



: Table 41 E. Domain Analysis of NOV41a 



1 Pfam Domain 

i 
i 


NOV41a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


j 



Example 42. 

The NOV42 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 42A. 



•Table 42 A. NOV42 Sequence Analysis 



r 



SEQ ID NO: 163 



"[3779 bp " " f 



jNOV42a, 

jCG 123660-0 1 DNA 
Sequence 



GCGGCCGCCCCGGCTCCCGGCTGCAGGAATCGCGCCAGGACGCTGGCCCCGCTCGCGGCTAGC 



TTGCACGCCAGGGCACAGCGAGG ATGGGAGGGTCGCAGTCCCTGCAGCCAGCCCCAGCCAGCG 



ACCTGAACCTGGAGGCTTCCGAGGCAATGAGTTCCGATTCTGAAGAGGCATTTGAGACCCCGG 
AGTCAACGACCCCTGTCAAAGCTCCGCCAGCTCCACCCCCACCACCCCCCGAAGTCATCCCAG 
AACCCGAGGTCAGCACACAGCCACCCCCGGAAGAACCAGGATGTGGTTCTGAGACAGTCCCTG 
TCCCTGATGGCCCACGGAGCGACTCGGTGGAAGGAAGTCCCTTCCGTCCCCCGTCACACTCCT 
TCTCTGCCGTCTTCGATGAAGACAAGCCGATAGCCAGCAGTGGGACTTACAACTTGGACTTTG 
ACAACATTGAGCTTGTGGATACCTTTCAGACCTTGGAGCCTCGTGCCTCAGACGCTAAGAATC 
AGGAGGGCAAAGTGAACACACGGAGGAAGTCCACGGATTCCGTCCCCATCTCTAAGTCTACAC 
TGTCCCGGTCGCTCAGCCTGCAAGCCAGTGACTTTGATGGTGCTTCTTCCTCAGGCAATCCCG 
AGGCCGTGGCCCTTGCCCCAGATGCATATAGCACGGGTTCCAGCAGTGCTTCTAGTACCCTTA 
AG CG AACT AAAAAACCGAGGCCG CCTTC C TT AAAAAAG AAAC AG ACC ACC AAG AAACC C A C AG 
AG ACC CC C CC AGTG AAGG AG ACG C AAC AG G AG CC AG ATG AAG AG AGC CTTGT C CC C AGTG GGG 
AG AATCT AG C ATCTG AG ACG AAAACGG AAT CTGCC AAG ACGG AAGGTC CT AG C C C AG C CT T AT 
TGGAGGAGACGCCCCTTGAGCCCGCTGTGGGGCCCAAAGCTGCCTGCCCTCTGGACTCAGAGA 
GTGCAGAAGGGGTTGTCCCCCCGGCTTCTGGAGGTGGCAGAGTGCAGAACTCACCCCCTGTCG 
GGAGGAAAACGCTGCCTCTTACCACGGCCCCGGAGGCAGGGGAGGTAACCCCATCGGATAGCG 
GGGGGCAAGAGGACTCTCCAGCCAAAGGGCTCTCCGTAAGGCTGGAGTTTGACTATTCTGAGG 
ACAAGAGTAGTTGGGACAACCAGCAGGAAAACCCCCCTCCTACCAAAAAGATAGGCAAAAAGC 
CAGTTGCCAAAATGCCCCTGAGGAGGCCAAAGATGAAAAAGACACCCGAGAAACTTGACAACA 
CTCCTGCCTCACCTCCCAGATCCCCTGCTGAACCCAATGACATCCCCATTGCTAAAGGTACTT 
ACACCTTTGATATTGACAAGTGGGATGACCCCAATTTTAACCCTTTTTCTTCCACCTCAAAAA 
TGCAGGAGTCTCCCAAACTGCCCCAACAATCATACAACTTTGACCCAGACACCTGTGATGAGT 
CCGTTGACCCCTTTAAGACATCCTCTAAGACCCCCAGCTCACCTTCTAAATCCCCAGCCTCCT 
TTGAGATCCCAGCCAGTGCTATGGAAGCCAATGGAGTGGACGGGGATGGGCTAAACAAGCCCG 
CCAAGAAGAAGAAGACGCCCCTAAAGACTGACACATTTAGGGTGAAAAAGTCGCCAAAACGGT 
CTCCTCTCTCTGATCCACCTTCCCAGGACCCCACCCCAGCTGCTACACCAGAAACACCACCAG 
TGATCTCTGCGGTGGTCCACGCCACAGATGAGGAAAAGCTGGCGGTCACCAACCAGAAGTGGA 
TGTGCATGACAGTGGACCTAGAGGCTGACAAACAGGACTACCCGCAGCCCTCGGACCTGTCCA 
CCTTTGTAAACGAGACCAAATTCAGTTCACCCACTGAGGGTAAGCAACTCTGTAGCCAGCTGG 
ACCCCCACTCTGCCTCGGAGAATCCTGCCCCTAGGGAGCCAAAGGCCAGGAGAGAAACTTCTC 
AGCCAGAGTTGGATTACAGAAACTCCTATGAAATTGAATATATGGAGAAAATTGGCTCCTCCT 
TACCTCAGGACGACGATGCCCCGAAGAAGCAGGCCTTGTACCTTATGTTTGACACTTCTCAGG 
AGAGCCCTGTCAAGTCATCTCCCGTCCGCATGTCAGAGTCCCCGACGCCGTGTTCAGGGTCAA 
GTTTTGAAGAGACTGAAGCCCTTGTGAACACTGCTGCGAAAAACCAGCATCCTGTCCCACGAG 
GACTGGCCCCTAACCAAGAGTCACACTTGCAGGTGCCAGAGAAATCCTCCCAGAAGGAGCTGG 
AGGCCATGGGCTTGGGCACCCCTTCAGAAGCGATTGAAATTACAGCTCCCGAGGGCTCCTTTG 
CCTCTGCTGACGCCCTCCTCAGCAGGCTAGCTCACCCCGTCTCTCTCTGTGGTGCACTTGACT 
ATCTGGAGCCCGACTTAGCAGAAAAGAACCCCCCACTATTCGCTCAGAAACTCCAGGAGGAGT 
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TAGAGTTTGCCATCATGCGGATAGAAGCCCTGAAGCTGGCCAGGCAGATTGCTTTGGCTTCCC 
GCAGCCACCAGGATGCCAAGAGAGAGGCTGCTCACCCAACAGACGTCTCCATCTCCAAAACAG 
CCTTGTACTCCCGCATCGGGACCGCTGAGGTGGAGAAACCTGCAGGCCTTCTGTTCCAGCAGC 
CCGACCTGGACTCTGCCCTCCAGATCGCCAGAGCAGAGATCATAACCAAGGAGAGAGAGGTCT 
CAGAATGGAAAGATAAATATGAAGAAAGCAGGCGGGAAGTGATGGAAATGAGGAAAATAGTGG 
CCGAGTATGAGAAGACCATCGCTCAGATGATAGAGGACGAACAGAGAGAGAAGTCAGTCTCCC 
ACCAGACGGTGCAGCAGCTGGTTCTGGAGAAGGAGCAAGCCCTGGCCGACCTGAACTCCGTGG 
AGAAGTCTCTGGCCGACCTCTTCAGAAGATATGAGAAGATGAAGGAGGTCCTAGAAGGCTTCC 
GCAAGAATGAAGAGGTGTTGAAGAGATGTGCGCAGGAGTACCTGTCCCGGGTGAAGAAGGAGG 
AGCAGAGGTACCAGGCCCTGAAGGTGCACGCGGAGGAGAAACTGGACAGGGCCAATGCTGAGA 
TTGCTCAGGTTCGAGGCAAGGCCCAGCAGGAGCAAGCCGCCCACCAGGCCAGCCTGCGGAAGG 
AGCAGCTGCGAGTGGACGCCCTGGAAAGGACGCTGGAGCAGAAGAATAAAGAAATAGAAGAAC 
TCACCAAGATTTGTGACGAACTGATTGCCAAAATGGGGAAAAGCTAACTCTGAACCGAATGTT 
TTGGACTTAACTGTTGCGGCAATATGACCGTCGGCACACTGCTGTTCCTCCAGTTCCATGGAC 


AGGTTCTGTTTTCACTTTTTCGTATGCACTACTGTATTTCCTTTCTAAATAAAATTGATTTGA 


TTGTATGCAGTACTAAGGAGACTATCAGAATTTCTTGCTATTGGTTTGCATTTTCCTAGTATA 


ATTCATAGCAAGTTGACCTCAGAGTTCCTGTATCAGGGAGATTGTCTGATTCTCTAATAAAAG 


ACACATTGCTGACCTTGGCCTTGCCCTTTGTACACAAGTTCCCAGGGTGAGCAGCTTTTGGAT 


TTAATATGAACATGTACAGCGTGCATAGGGACTCTTGCCTTAAGGAGTGTAAACTTGATCTGC 


ATTTGCTGATTTGTTTTTAAAAAAACAAGAAATGCATGTTTCAAATAAAATTCTCTATTGTAA 


ATAAAATTTTTTCTTTGGATCTTGAAAAAAAAAAAAAAAAAAA 




ORF Start: ATG at 87 ] 


ORF Stop: TAA at 3258 




SEQIDNO:164 |1057aa MW at 1 15537.2kD 


|NOV42a 5 

CGI 23660-01 Protein 
Sequence 


MGGSQSLQPAPASDLNLEASEAMSSDSEEAFETPESTTPVKAPPAPPPPPPEVIPEPEVSTQP 

PPEEPGCGSETVPVPDGPRSDSVEGSPFRPPSHSFSAVFDEDKPIASSGTYNLDFDNIELVDT 

FQTLEPRASDAKNQEGKVNTRRKSTDSVPISKSTLSRSLSLQASDFDGASSSGNPEAVALAPD 

AYSTGSSSASSTLKRTKKPRPPSLKKKQTTKKPTETPPVKETQQEPDEESLVPSGENLASETK 

TESAKTEGPSPALLEETPLEPAVGPKAACPLDSESAEGWPPASGGGRVQNSPPVGRKTLPLT 

TAPEAGEVTPSDSGGQEDSPAKGLSVRLEFDYSEDKSSWDNQQENPPPTKKIGKKPVAKMPLR 

RPKMKKTPEKLDNTPASPPRSPAEPNDIPIAKGTYTFDIDKWDDPNFNPFSSTSKMQESPKLP 

QQSYNFDPDTCDESVDPFKTSSKTPSSPSKSPASFEIPASAMEANGVDGDGLNKPAKKKKTPL 

KTDTFRVKKSPKRSPLSDPPSQDPTPAATPETPPVISAWHATDEEKLAVTNQKWMCMTVDLE 

ADKQDYPQPSDLSTFVNETKFSSPTEGKQLCSQLDPHSASENPAPREPKARRETSQPELDYRM 

SYEIEYMEKIGSSLPQDDDAPKKQALYLMFDTSQESPVKSSPVRMSESPTPCSGSSFEETEAL 

VNTAAKNQHPVPRGLAPNQESHLQVPEKSSQKELEAMGLGTPSEAIEITAPEGSFASADALLS 

RLAHPVSLCGALDYLEPDLAEKNPPLFAQKLQEELEFAIMRIEALKLARQIALASRSHQDAKR j 

EAAHPTDVSISKTALYSRIGTAEVEKPAGLLFQQPDLDSALQIARAEI ITKEREVSEWKDKYE ! 

esrrevmemrkivaeyektiaqmiedeqreksvshqtvqqlvlekeqaladlnsveksladlf j 
rryekmkevlegfrkneevlkrcaqeylsrvkkeeqryqalkvhaeekldranaeiaqvrgka! 
qqeqaahqaslrkeqlrvdalertleqknke i eeltk i cdel i akmgks 


: 


SEQ ID NO: 165 |3686 bp 




NOV42b, 

CGI 23660-02 DNA 
Sequence 


GCGGCCGCCCCGGCTCCCGGCTGCAGGAATCGCGCCAGGACGCTGGCCCCGCTCGCGGCTAGC 


TTGCACGCCAGGGCACAGCGAGGATGGGAGGGTCGCAGTCCCTGCAGCCAGCCCCAGCCAGCG 
ACCTGAACCTGGAGGCTTCCGAGGCAATGAGTTCCGATTCTGAAGAGGCATTTGAGACCCCGG 
AGTCAACGACCCCTGTCAAAGCTCCGCCAGCTCCACCCCCACCACCCCCCGAAGTCATCCCAG 
AACCCGAGGTCAGCACACAGCCACCCCCGGAAGAACCAGGATGTGGTTCTGAGACAGTCCCTG 
TCCCTGATGGCCCACGGAGCGACTCGGTGGAAGGAAGTCCCTTCCGTCCCCCGTCACACTCCT 
TCTCTGCCGTCTTCGATGAAGACAAGCCGATAGCCAGCAGTGGGACTTACAACTTGGACTTTG 
ACAACATTGAGCTTGTGGATACCTTTCAGACCTTGGAGCCTCGTGCCTCAGACGCTAAGAATC 
AGGAGGGCAAAGTGAACACACGGAGGAAGTCCACGGATTCCGTCCCCATCTCTAAGTCTACAC 
TGTCCCGGTCGCTCAGCCTGCAAGCCAGTGACTTTGATGGTGCTTCTTCCTCAGGCAATCCCG 
AGGCCGTGGCCCTTGCCCCAGATGCATATAGCACGGGTTCCAGCAGTGCTTCTAGTACCCTTA 
AGCGAACTAAAAAACCGAGGCCGCCTTCCTTAAAAAAGAAACAGACCACCAAGAAACCCACAG 
AGACCCCCCCAGTGAAGGAGACGCAACAGGAGCCAGATGAAGAGAGCCTTGTCCCCAGTGGGG 
AGAATCTAGCATCTGAGACGAAAACGGAATCTGCCAAGACGGAAGGTCCTAGCCCAGCCTTAT 
TGGAGGAGACGCCCCTTGAGCCCGCTGTGGGGCCCAAAGCTGCCTGCCCTCTGGACTCAGAGA 
GTGC AG AAGGGGTTGTCCCCC CGG CT TCTGG AGG TGG C AG AGTG C AG AACTCAC C C CC TGT CG 
GGAGGAAAACGCTGCCTCTTACCACGGCCCCGGAGGCAGGGGAGGTAACCCCATCGGATAGCG 
GGGGGCAAGAGGACTCTCCAGCCAAAGGGCTCTCCGTAAGGCTGGAGTTTGACTATTCTGAGG 
ACAAGAGTAGTTGGGACAACCAGCAGGAAAACCCCCCTCCTACCAAAAAGATAGGCAAAAAGC 
CAGTTGCCAAAATGCCCCTGAGGAGGCCAAAGATGAAAAAGACACCCGAGAAACTTGACAACA 
CTCCTGCCTCACCTCCCAGATCCCCTGCTGAACCCAATGACATCCCCATTGCTAAAGGTACTT 
AC ACCTTTG AT ATTG AC AAGTGG G ATG ACC C C AATTT T AACC CTT TT TCT T CC AC C TC AAAAA 
rGCAGGAGTCTCCCAAACTGCCCCAACAATCATACAACTTTGACCCAGACACCTGTGATGAGT 
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I 
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i 

! 
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\ 
i 

! 

! 
I 

i 

i 
I 

1 

{ 

i 


CCGTTGACCCCTTTAAGACATCCTCTAAGACCCCCAGCTCACCTTCTAAATCCCCAGCCTCCT 
TTGAGATCCCAGCCAGTGCTATGGAAGCCAATGGAGTGGACGGGGATGGGCTAAACAAGCCCG 
CCAAGAAGAAGAAGACGCCCCTAAAGACTGACACATTTAGGGTGAAAAAGTCGCCAAAACGGT 
CTCCTCTCTCTGATCCACCTTCCCAGGACCCCACCCCAGCTGCTACACCAGAAACACCACCAG 
TGATCTCTGCGGTGGTCCACGCCACAGATGAGGAAAAGCTGGCGGTCACCAACCAGAAGTGGA 
CGTG CAT G AC AGTGG ACCTAG AG G CT G AC AAAC AGG ACT ACC CGC AG CC C TCGG ACC TGTC C A 
CCTTTGTAAACGAGACCAAATTCAGTTCACCCACTGAGGAGTTGGATTACAGAAACTCCTATG 
AAATTGAATATATGGAGAAAATTGGCTCCTCCTTACCTCAGGACGACGATGCCCCGAAGAAGC 
AGGCCTTGTACCTTATGTTTGACACTTCTCAGGAGAGCCCTGTCAAGTCATCTCCCGTCCGCA 
TGTCAGAGTCCCCGACGCCGTGTTCAGGGTCAAGTTTTGAAGAGACTGAAGCCCTTGTGAACA 
CTGCTGCGAAAAACCAGCATCCTGTCCCACGAGGACTGGCCCCTAACCAAGAGTCACACTTGC 
AGGTGCCAGAGAAATCCTCCCAGAAGGAGCTGGAGGCCATGGGCTTGGGCACCCCTTCAGAAG 
CGATTGAAATTACAGCTCCCGAGGGCTCCTTTGCCTCTGCTGACGCCCTCCTCAGCAGGCTAG 
CTCACCCCGTCTCTCTCTGTGGTGCACTTGACTATCTGGAGCCCGACTTAGCAGAAAAGAACC 
CCCC AC T ATT CGCTC AG AAACTC C AGG AGG AGTT AG AGTTTG CC ATC ATG CGG AT AG AAG C C C 
TGAAGCTGGCCAGGCAGATTGCTTTGGCTTCCCGCAGCCACCAGGATGCCAAGAGAGAGGCTG 
CTCACCCAACAGACGTCTCCATCTCCAAAACAGCCTTGTACTCCCGCATCGGGACCGCTGAGG 
TGGAGAAACCTGCAGGCCTTCTGTTCCAGCAGCCCGACCTGGACTCTGCCCTCCAGATCGCCA 
GAGCAGAGATCATAACCAAGGAGAGAGAGGTCTCAGAATGGAAAGATAAATATGAAGAAAGCA 
GGCGGGAAGTGATGGAAATGAGGAAAATAGTGGCCGAGTATGAGAAGACCATCGCTCAGATGA 
TAGAGGACGAACAGAGAGAGAAGTCAGTCTCCCACCAGACGGTGCAGCAGCTGGTTCTGGAGA 
AGGAGCAAGCCCTGGCCGACCTGAACTCCGTGGAGAAGTCTCTGGCCGACCTCTTCAGAAGAT 
ATGAGAAGATGAAGGAGGTCCTAGAAGGCTTCCGCAAGAATGAAGAGGTGTTGAAGAGATGTG 
CGCAGGAGTACCTGTCCCGGGTGAAGAAGGAGGAGCAGAGGTACCAGGCCCTGAAGGTGCACG 
CGGAGGAGAAACTGGACAGGGCCAATGCTGAGATTGCTCAGGTTCGAGGCAAGGCCCAGCAGG 
AGCAAGCCGCCCACCAGGCCAGCCTGCGGAAGGAGCAGCTGCGAGTGGACGCCCTGGAAAGGA 
CG C TGG AGCAG AAGAAT AAAG AAAT AG AAG AACT C AC C AAG ATTTGT G ACG AA CT G AT TG C C A 
AAATGGGGAAAAGCTAACTCTGAACCGAATGTTTTGGACTTAACTGTTGCGGCAATATGACCG 


TCGGCACACTGCTGTTCCTCCAGTTCCATGGACAGGTTCTGTTTTCACTTTTTCGTATGCACT 


ACTGTATTTCCTTTCTAAATAAAATTGATTTGATTGTATGCAGTACTAAGGAGACTATCAGAA 


TTTCTTGCTATTGGTTTGCATTTTCCTAGTATAATTCATAGCAAGTTGACCTCAGAGTTCCTG 


TATCAGGGAGATTGTCTGATTCTCTAATAAAAGACACATTGCTGACCTTGGCCTTGCCCTTTG 


T ACAC AAGTT CCC AGGGTG AG CAGCT TT T GG ATTT AAT ATG AACATG T AC AG CGTG C A T AG G G 


ACTCTTGCCTTAAGGAGTGTAAACTTGATCTGCATTTGCTGATTTGTTTTTAAAAAAACAAGA 


AATGCATGTTTCAAATAAAATTCTCTATTGTAAATAAAATTTTTTCTTTGGATCTTGAAAAAA 


AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 




ORP Start: ATG at 87 j jORF Stop: TAA at 3 1 65 


1 


SEQIDNO:166 ]l026aa jMW at 1 12109.5kD 


i r NOV42b, 

|CG 123660-02 Protein 
SSequence 

! 
! 

i 
i 

i 


MGGSQSLQPAPASDLNLEASEAMSSDSEEAFETPESTTPVKAPPAPPPPPPEVIPEPEVSTQP 
PPEEPGCGSETVPVPDGPRSDSVEGSPFRPPSHSFSAVFDEDKPIASSGTYNLDFDNIELVDT 
FQTLEPRASDAKNQEGKVNTRRKSTDSVPISKSTLSRSLSLQASDFDGASSSGNPEAVALAPD 
AYSTGSSSASSTLKRTKKPRPPSLKKKQTTKKPTETPPVKETQQEPDEESLVPSGENLASETK 
TR.Q AKTEfiPQ PAI.I*PFTPT.PP2\vnpj(A2ir , DT ,n<;pcaPft\A/DDa crr:rovnMCDD\;rDVTT dt t 

TAPEAGEVTPSDSGGQEDSPAKGLSVRLEFDYSEDKSSWDNQQENPPPTKKIGKKPVAKMPLR 
RPKMKKTPEKLDNTPASPPRSPAEPNDIPIAKGTYTFDIDKWDDPNFNPFSSTSKMOESPKLP 
QQSYNFDPDTCDESVDPFKTSSKTPSSPSKSPASFEIPASAMEANGVDGDGLNKPAKKKKTPL 
KTDTFRVKKSPKRSPLSDPPSQDPTPAATPETPPVISAWHATDEEKLAVTNQKWTCMTVDLE 
ADKQDYPQPSDLSTFVNETKFSSPTEELDYRNSYEIEYMEKIGSSLPQDDDAPKKQALYLMFD 
TSQESPVKSSPVRMSESPTPCSGSSFEETEALVNTAAKNQHPVPRGLAPNQESHLQVPEKSSQ 
KELEAMGLGTPSEAIEITAPEGSFASADALLSRLAHPVSLCGALDYLEPDLAEKNPPLFAQKL 
QEELEFAIMRIEALKIARQIALASRSHQDAKREAAHPTDVSISKTALYSRIGTAEVEKPAGLL 
FQQPDLDSALQIARAEIITKEREVSEWICDKYEESRREVMEMRKIVAEYEKTIAQMIEDEQREK 
SVSHQTVQQLVLEKEQALADLNSVEKSLADLFRRYEKMKEVLEGFRKNEEVLKRCAQEYLSRV 
KKEEQRYQALKVHAEEKLDRANAEIAQVRGKAQQEQAAHQASLRKEQLRVDALERTLEQKNKE 
IEELTKICDELIAKMGKS 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 42B. 
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Table 42B. Comparison of NOV42a against NOV42b. 



Protein Sequence 



NOV42b 



NOV42a Residues/ 
Match Residues 



I. .1057 
I. .1026 



Identities/ 

Similarities for the Matched Region 



916/1057(86%) 
916/1057 (86%) 



Further analysis of the NOV42a protein yielded the following properties shown in 
Table 42C. 



Table 42C. Protein Sequence Properties NOV42a 


PSort 
analysis: 


0.9800 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0. 1 000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



5 ) 

A search of the NOV42a protein against the Geneseq database, a proprietary database 



that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 42D. 



- ■ — ■■ - ■■■■ — - 

Table 42D. Geneseq Results for NOV42a 


Geneseq 
Identifier 


Protcin/Organism/Length [Patent 
#, Date] 


NOV42a 
Residues/ 
Match 
Residues 


Identities/ j ^ 
Similarities for the j y " " * 
Matched Region j " ue 


AAM38678 


Human polypeptide SEQ ID NO 1823 

- Homo sapiens, 1013 aa. 

[ WO200 1 533 1 2-A 1 , 26-JUL-200 1 ] 


24. .1057 
41. .1013 


972/1034(94%) 
972/1034 (94%) 


0.0 


AAM38680 


Human polypeptide SEQ ID NO 1825 

- Homo sapiens, 1025 aa. 

[ WO200 1 533 1 2-A 1 , 26-JUL-200 1 ] 


24..1057 
41. .1025 


972/1046 (92%) 
972/1046 (92%) 


0.0 


AAM40466 


Human polypeptide SEQ ID NO 5397 

- Homo sapiens, 865 aa. 

[ WO200 1 533 1 2-A 1 , 26-J UL-200 1 ] 


133..1057 
I. .865 


853/926 (92%) 
855/926 (92%) 


0.0 


AAM40465 


Human polypeptide SEQ ID NO 5396 

- Homo sapiens, 865 aa. 

[ WO200 1 533 1 2-A 1 , 26-JUL-200 1 ] 


I33..I057 
I..865 


853/926 (92%) 
855/926 (92%) 


0.0 


AAM40464 


Human polypeptide SEQ ID NO 5395 

- Homo sapiens, 865 aa. 

[ WO200 1 533 1 2-A 1 , 26-JUL-200 1 ] 


133..1057 
1.865 


853/926 (92%) 
855/926 (92%) 


0.0 
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In a BLAST search of public sequence datbases. the NOV42a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 42E. 



' Tabic 42 E. Public BLASTP Results for NOV42a 



, Protein 

■ Ac rpssinn 

| Number 


Pmtf»in/Oroanism/T ,pnc*th 

l I Vrl Villi V-fl g« 1113111/ LfVlltl 11 


NOV42a 
Residues/ 
Match 
Residues 


j 

Identities/ !^ ^ 
Similarities for the ! y a j uc 
Matched Portion j 


j 095359 


Transforming acidic coiled-coil- 
containing protein 2 (Anti Zuai-1) 
(AZU-1) - Homo sapiens (Human), 
I026aa. 


1 ..1057 
1 ..1 026 


1025/1057(96%) i 0.0 
1025/1057 (96%) j 

•1 
i 

i 


Q9JJG0 

1 


Transforming acidic coiled-coil- 
containing protein 2 - Mus musculus 
(Mouse), 1035 aa. 


1..1057 
1..I035 


886/1071(82%) ; 0.0 
928/1071 (85%) 


| Q8TCK9 


Hypothetical 88.9 kDa protein - 
Homo sapiens (Human), 807 aa 
(fragment). 


143.. 1057 
I..807 


715/918(77%) 
742/918(79%) 


0.0 


Q8WVR1 


Hypothetical 64.7 kDa protein - 
Homo sapiens (Human), 575 aa. 


375..1057 
I..575 


484/687 (70%) 
510/687(73%) 


0.0 


Q99KQ6 


Similar to transforming, acidic coiled- 
coil containing protein 2 - Mus 
musculus (Mouse), 598 aa. 


375..1057 
1..598 


452/686 (65%) 
497/686 (71%) 


0.0 



5 PFam analysis predicts that the NOV42a protein contains the domains shown in the 

Table 42F. 



r 



| Table 42F. Domain Analysis of NOV42a 



I 

i 

1 




Identities/ 




j Pfam Domain 


NOV42a Match Region 


Similarities 


Expect Value 






for the Matched Region 



I 

I 



Example 43. 

10 The NOV43 clone was analyzed, and the nucleotide and encoded polypeptide 

sequences are shown in Table 43A. 



iTnble 43A. NOV43 Sequence Analysis 



|3351 bp 



[NOV43a 5 



SEQ ID NO: 167 

GACTCCTGCCTCAGGATGCCGGGGGAGGAAGAGGAGCGGGCCTTCCTGGTGGCCCGCGAGGAG 
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jCG 123955-01 DNA 
'Sequence 

i 

1 
j 

1 

i 

; 

I 

i 

i 
1 

I 

i 


CTGGCGAGCGCCCTGAGGAGGGATTCCGGGCAGGCGTTTTCCCTGGAGCAGCTCCGGCCGCTA 

CTAGCCAGCTCTCTGCCGCTAGCCGCCCGCTACCTGCAGCTGGACGCCGCACGCCTTGTCCGC 

TGCAACGCTCATGGGGAGCCCCGAAACTACCTCAACACCCTGTCCACGGCTCTGAACATCCTG 

GAGAAATACGGCCGCAACCTTCTCAGCCCTCAGCGGCCTCGGTACTGGCGTGGTGTCAAGTTT 

AAT AACCCTGTCTT T CGC AGC ACGG T GG ATG CTG TG C AG C AGGGGGG C CG AG ATG TGCTGCG A 

TTATATGGCTACACAGAGGAGCAACCAGATGGGTTGAGCTTCCCCGAAGGGCAGGAGGAGCCA 

GATGAGCACCAGGTTGCTACAGTCACACTGGAAGTACTGCTGCTTCGGACAGAGCTCAGCCTG 

CTATTGCAGAATACTCATCCAAGACAGCAGGCACTGGAGCAGCTGTTGGAAGACAAGGTTGAA 

GATATGCTGCAGCTTTCAGAATTTGACCCCCTATTGAGAGAGATTGCTCCTGGCCCCCTCACC 

ACACCCTCTGTCCCAGGCTCCACTCCTGGTCCCTGCTTCCTCTGTGGTTCTGCCCCAGGCACA 

CTGCACTGCCCATCCTGTAAACAGGCCCTGTGTCCAGCCTGTGACCACCTGTTCCATGGACAC 

CCATCCCGTGCTCATCACCTCCGCCAGACCCTGCCTGGGGTCCTGCAGGGTACCCACGTGAGC 

AGTTTACCTGCCTCAGCCCAACCACGGCCCCAGTCGACCTCCCTGCTGGCCCTGGGAGACAGC 

TCTCTTTCTTCCCCTAATCCTGCAAGTGCTCATTTGCCCTGGCACTGTGCTGCCTGTGCCATG 

CT AAATG AG CCTTGGGCAGTGCT CTGTGTGG CC TGTG AT CGG CCCCG AGG CTGTAAGGGGT TG 

GGGTTGGGAACTGAGGGTCCCCAAGGAACTGGAGGCCTAGAACCTGATCTTGCACGGGGTCGG 

TGGG C CTGCCAG AG CTGT AC CT T TG AG AATG AGG C AG CTGCTGTG CT ATGTT C C AT ATGTG AG 

CGACCTCGGCTGGCCCAGCCTCCCAGCTTGGTGGTGGATTCCCGAGATGCTGGCATTTGCCTG 

CAACCCCTTCAGCAGGGGGATGCTTTGCTGGCCTCTGCCCAGAGTCAAGTCTGGTACTGTATT 

CACTGTACCTTCTGCAACTCGAGCCCTGGCTGGGTGTGTGTTATGTGCAACCGGACTAGTAGC 

CCCATTCCAGCACAACATGCCCCCCGGCCCTATGCCAGCTCTTTGGAAAAGGGACCCCCCAAG 

CCTGGGCCCCCACGACGCCTTAGTGCCCCCCTGCCCAGTTCCTGTGGAGATCCTGAGAAGCAG 

CGCCAAGACAAGATGCGGGAAGAAGGCCTCCAGCTAGTGAGCATGATCCGGGAAGGGGAAGCC 

GCAGGTGCCTGTCCAGAGGAGATCTTCTCGGCTCTGCAGTACTCGGGCACTGAGGTGCCTCTG 

CAGTGGTTGCGCTCAGAACTGCCCTACGTCCTGGAGATGGTGGCTGAGCTGGCTGGACAGCAG 

GACCCTGGGCTGGGTGCCTTTTCCTGTCAGGAGGCCCGGAGAGCCTGGCTGGATCGTCATGGC 

AACCTTGATGAAGCTGTGGAGGAGTGTGTGAGGACCAGGCGAAGGAAGGTACAGGAGCTCCAG 

TCTCTAGGCTTTGGGCCTGAGGAGGGGTCTCTCCAGGCATTGTTCCAGCACGGAGGTGATGTG 

TCACGGGCCCTGACTGAGCTACAGCGCCAACGCCTAGAGCCCTTCCGCCAGCGCCTCTGGGAC 

AGTGGCCCTGAGCCCACCCCTTCCTGGGATGGGCCAGACAAGCAGAGCCTGGTCAGGCGGCTT 

TTGGCAGTCTACGCACTCCCCAGCTGGGGCCGGGCAGAGCTGGCACTGTCACTGCTGCAGGAG 

ACACCCAGGAACTATGAGTTGGGGGATGTGGTAGAAGCTGTGAGGCACAGCCAGGACCGGGCC 

TT CCTGCGCCG CTTGCTTGCCCAGG AGTGTG CCGTGTG TGG CTGGGCCCTG C C CC AC AAC CGG 

ATGCAGGCCCTGACTTCCTGTGAGTGCACCATCTGTCCTGACTGCTTCCGCCAGCACTTCACC 

ATCGCCTTGAAGGAGAAGCACATCACAGACATGGTGTGCCCTGCCTGTGGCCGCCCCGACCTC 

AC CG ATG AC AC AC AGTTG CT C AGCT ACTT CT CT ACC CTTG AC ATCC AGCTT CG CG AG AG C CT A 

GAGCCAGATGCCTATGCGTTGTTCCATAAGAAGCTGACCGAGGGTGTGCTGATGCGGGACCCC 

AAGTTCTTGTGGTGTGCCCAGTGCTCCTTTGGCTTCATATATGAGCGTGAGCAGCTGGAGGCA 

ACTTGTCCCCAGTGTCACCAGACCTTCTGTGTGCGCTGCAAGCGCCAGTGTGAGGACTTCCAG 

AACTGGAAACGCATGAACGACCCAGAATACCAGGCCCAGGGCCTAGCAATGTATCTTCAGGAA 

AACGGCATTGACTGCCCCAAATGCAAGTTCTCGTACGCCCTGGCCCGAGGAGGCTGCATGCAC 

TTT GACTGT AC CC AGTGC CG C CAC C AGTT CTG C AGCG GCTG C T AC AATG CCT TTT ACG C C AAG 

AATAAATGTCCAGAGCCTAACTGCAGGGTGAAAAAGTCCCTGCACGGCCACCACCCTCGAGAC 

TGCCTCTTCTACCTGCGGGACTGGACTGCTCTCCGGCTTCAGAAGCTGCTACAGGACAATAAC 

GTCATGTTTAATACAGAGCCTCCAGCTGGGGCCCGGGCAGTCCCTGGAGGTGGCTGCCGAGTG 

ATAGAGCAGAAGGAGGTTCCCAATGGGCTCAGGGACGAAGCTTGTGGCAAGGAAACTCCAGCT 

GGCTATGCCGGCCTGTGCCAGGCACACTACAAAGAGTATCTTGTGAGCCTCATCAATGCCCAC 

TCG CTGG ACCCAG C C ACC TTGT ATG AGGTGG AAG AG C TGG AG ACGG C C ACTG AG C G CT AC CTG 

CACGTACGCCCCCAGCCTTTGGCTGGAGAGGATCCCCCTGCTTACCAGGCCCGCCTTCTGCAG 

AAGCTGACAGAAGAGGTACCCTTGGGACAGAGTATCCCCCGCAGGCGGAAGTAGCTGAGGGCA 

AGGGTCCCGATGAGGGTCCCATGGCCTGCTCCCTCAGGAACAGCTCCAGCACCAATAAAGAGG 




CATCTTACCACCCAGGCTTCTTGGTGGTCCTTCTTCCTGGTGCCACCATCTAGGGGCACCAGG 




GAAAG AG CGGGG 




i 


ORF Start: ATG at 16 j 


ORF Stop: TAG at 3202 


r 


SEQ ID NO: 168 1062 aa |MW at 1 18400.2kD 


|NOV43a ? 

CGI 23955-01 Protein 
Sequence 


MPGEEEERAFLVAREELASALRRDSGQAFSLEQLRPLLASSLPLAARYLQLDAARLVRCNAHG 
EPRNYLNTLSTALNILEKYGRNLLSPQRPRYWRGVKFNNPVFRSTVDAVQQGGRDVLRLYGYT 
EEQPDGLSFPEGQEEPDEHQVATVTLEVLLLRTELSLLLQNTHPRQQALEQLLEDKVEDMLQL 
SEFDPLLREIAPGPLTTPSVPGSTPGPCFLCGSAPGTLHCPSCKQALCPACDHLFHGHPSRAH 
HLRQTLPGVLQGTHLSSLPASAQPRPQSTSLLALGDSSLSSPNPASAHLPWHCAACAMLNEPW 
AVLCVACDRPRGCKGLGLGTEGPQGTGGLEPDLARGRWACQSCTFENEAAAVLCSICERPRLA 
QPPSLWDSRDAGICLQPLQQGDALLASAQSQVWYCIHCTFCNSSPGWVCVMCNRTSSPIPAQ 
HAPRPYASSLEKGPPKPGPPRRLSAPLPSSCGDPEKQRQDKMREEGLQLVSMIREGEAAGACP 
EEIFSALQYSGTEVPLQWLRSELPYVLEMVAELAGQQDPGLGAFSCQEARRAWLDRHGNLDEA 
VEECVRTRRRKVQELQSLGFGPEEGSLQALFQHGGDVSRALTELQRQRLEPFRQRLWDSGPEP 
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TPSWDGPDKQSLVRRLLAVYALPSWGRAELALSLLQETPRNYELGDVVEAVRHSQDRAFLRRL 
LAQECAVCGWALPHNRMQALTSCECTICPDCFRQHFTIALKEKHITDMVCPACGRPDLTDDTQ 
LLSYFSTLDIQLRESLEPDAYALFHKKLTEGVLMRDPKFLWCAQCSFGFIYEREQLEATCPQC 
HQTFCVRCKRQCEDFQNWKRMNDPEYQAQGLAMYLQENGIDCPKCKFSYALARGGCMHFHCTQ 
CRHQFCSGCYNAFYAKNKCPEPNCRVKKS LHGHH PRDCL F YLRDWTALRLQKLLQDNNVM FNT 
EPPAGARAVPGGGCRVIEQKEVPNGLRDEACGKETPAGYAGLCQAHYKEYLVSLINAHSLDPA 
TLYEVEELETATERYLHVRPQPLAGEDPPAYQARLLQKLTEEVPLGQSIPRRRK 



Further analysis of the NOV43a protein yielded the following properties shown in 
Table 43B. 



Tabic 43B. Protein Sequence Properties NOV43a 


PSort 
analysis: 


0.7000 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



5 



A search of the NOV43a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 43C. 



Table 43C. Geneseq Results for NOV43a 


I Geneseq 
: Identifier 


Protcin/Organism/Length |Patent #, 
Date] 


NOV43a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


i 

j Expect 
Value 


ABB50186 


Human transcription factor TRFX-37 - 
Homo sapiens, 579 aa. 
[WO200172777-A2, 04-OCT-2001] 


493.. 1062 
1..579 


570/579 (98%) 
570/579 (98%) 


0.0 


ABB97407 


Novel human protein SEQ ID NO: 675 
- Homo sapiens, 505 aa. 
[WO200222660-A2, 2 1 -MAR-2002] 


630.. 1062 
73. .505 


431/433 (99%) 
431/433 (99%) 


0.0 


AAB92527 


Human protein sequence SEQ ID 
NO: 1 068 1 - Homo sapiens, 505 aa. 
[EPI074617-A2, 07-FEB-2001] 


630.. 1062 
73..505 


430/433 (99%) 
431/433 (99%) 


0.0 


ABB97408 


Novel human protein SEQ ID NO: 676 
- Homo sapiens, 5 14 aa. 
[WO200222660-A2, 21 -MAR-2002] 


630.. 1062 
73..514 


432/442 (97%) 
432/442 (97%) 


0.0 


ABB65643 


Drosophila melanogaster polypeptide 
SEQ ID NO 2372 1 - Drosophila 
melanogaster, 2421 aa. 
[ WO200 1 7 1 042-A2, 27-SEP-200 1 ] 


680..10I5 
2004..2381 


147/378 (38%) 
200/378 (52%) 


7e-78 
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In a BLAST search of public sequence datbases, the NOV43a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 43D. 



Table 43D. Public BLASTP Results for NOV43a 


Protein 

Accession 

Number 


Protein/Orga n ism/Length 


NOV43a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


Q96EP0 


Unknown (protein for MGC: 19975) - 
Homo sapiens (Human), 1 072 aa. 


1..1062 
1..1072 


1061/1073 (98%) 
1061/1073 (98%) 


0.0 


Q924T7 


FLJ101 1 1 - Mus musculus (Mouse), 
1057 aa. 


1..1062 
1..I057 


918/1064 (86%) 
957/1064 (89%) 


0.0 


Q96NF1 


CDNA FLJ30980 fis, clone 
HHDPC2000I49, highly similar to 
Mus musculus partial muscle protein 
534 - Homo sapiens (Human). 642 aa. 


430.. 1062 
I..642 


633/642 (98%) 
633/642 (98%) 


0.0 


Q8TEI0 


FLJ002 17 protein - Homo sapiens 
(Human), 547 aa (fragment). 


525.. 1062 
1..547 


538/547 (98%) 
538/547 (98%) 


0.0 


Q96GB4 


Similar to hypothetical protein 
FLJ 1 0 1 1 1 - Homo sapiens (Human), 
539 aa. 


533.. 1062 
1..539 


530/539 (98%) 
530/539 (98%) 


0.0 



5 PFam analysis predicts that the NOV43a protein contains the domains shown in the 

Table 43E. 



j Table 43E. Domain Analysis of NOV43a 


Pfam Domain 


NOV43a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


zf-B_box 


2I2..258 


12/49(24%) 
29/49 (59%) 


0.19 


zf-RanBP 


298..328 


10/32 (31%) 
20/32 (62%) 


0.33 


PHD 


304..361 


14/64 (22%) 
40/64 (62%) 


0.041 


zf-RanBP 

i 


349..378 


10/32 (31%) 
19/32 (59%) 


0.0052 
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Example 44. 

The NOV44 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 44A. 



Table 44 A. NOV44 Sequence Analysis 




SEQIDNO: 169 ;2847 bp ] 


NOV44a, 

CGI 24672-01 DNA 
Sequence 


ACCTTTAAGCGTCACGGGTGGGGCTGCAGCTTCTGGACCTAGGACTTTGAACATGTCGCGCCT 


GAAGCGGATAGCGGGGCAGGATCTCCGCGCTGGTTTCAAAGCAGGTGGAAGAGACTGCGGTAC 
CT CGG TAC C C C AAGGGCTGT TG AAG G C AG CG AGG AAG AG CGG CC AGTT AAACC TGT CG G GT AG 
AAACCT CAGTG AAGTG C C G C AGTGTGTCTGG AG AAT AAATGTGG AT ATCC CTG AG G AAG CT AA 
TC AGAATCTTT CG TTTGG TG CT A CTG AAAG ATGGTGGG AGC AG AC AG ATT TG ACC AAA C T AAT 
AATATCAAACAATAAACTTCAGTCACTTACAGATGACCTGCGACTCTTGCCTGCACTGACTGT 
TC TTG AT AT AC ATG ATAATC AGT TG AC AT C C CTTC C TTCTG C T AT AAG AG AGC T AG AAAATC T 
TCAGAAACTTAATGTCAGCCATAATAAACTGAAAATACTCCCTGAAGAAATTACAAACCTAAG 
AAACCTG AAGTGC CTGT ATCTC C AG C AT AATG AATT AAC CTG CAT ATC AG AGG G A TTTG AAC A 
ACTTTCCAATTTAGAAGATTTAGATCTTTCAAACAATCATCTTACAACTGTTCCTGCTAGTTT 
TTCTTCTCTGTCCAGTCTGGTGCGACTCAATCTTTCTAGTAATGAACTGAAGAGTTTGCCAGC 
AGAAATAAATAGAATGAAAAGGTTGAAGCATTTGGATTGTAATTCAAATCTCTTGGAAACTAT 
ACCTCCTGAATTGGCTGGCATGGAATCACTAGAATTGCTTTATTTGCGGAGGAATAAATTACG 
TTTTCTACCAGAATTTCCTTCTTGTAGTCTATTGAAGGAATTGCACGTAGGTGAAAACCAGAT 
TGAAATGTTAGAGGCAGAACATCTTAAACATCTGAATTCAATTCTTGTGCTAGACCTGAGGGA 
TAACAAGTTAAAATCTGTTCCAGATGAAATTATACTACTACGGTCCTTGGAAAGGCTTGACCT 
AAGCAACAATGATATTAGTAGTCTTCCCTATTCATTGGGGAACCTTCATTTGAAATTTTTGGC 
ATTAGAAGGAAATCCTTTGAGAACAATTCGAAGAGAAATTATAAGTAAAGGAACACAAGAAGT 
CCTAAAATATCTACGAAGCAAGATCAAAGATGATGGACCTAGCCAAAGTGAGTCTGCTACTGA 
G ACTG C C ATG AC ACT ACC AAG TG AAT CC AG AGTC AAT AT AC ATG C C ATCATT AC AT T AAAAAT 
ATTAGACTATAGTGATAAACAAGCAACTTTGATTCCTGATGAGGTGTTTGATGCAGTAAAAAG 
C AAC ATCGT C AC T TCT ATT AACTTC AGT AAG AAT C AAC TATG TG AAATTCC AAAAAGG ATGGT 
AGAACTGAAGGAAATGGTTTCTGATGTCGATCTCAGTTTTAATAAACTTTCCTTTATATCCTT 
GG AGTT ATGTGTG CT TC AG AAATTG ACTTTTTTAG AT CT CAGG AAC AATTTTT T AAAT T C T TT 
GCCAG AAG AAATGG AAT C ACTGGT AAG AC TG C AAAC G ATC AATC TTTC CTTT AAT AGG TTT AA 
AATGCTACCTGAAGTTCTATATCGTATCTTCACACTTGAAACAATTCTGATTAGTAATAATCA 
GG 1 1 GGA iLlbl GGAL.t_v_ X U AGAAAA I GAAG ATGATGG AAAATCTGACC ACGTTGG AC CTTC A 
AAATAATGACCTCTTACAAATTCCACCAGAGCTCGGTAATTGTGTAAACTTAAGAACATTACT 
ATTGGATGGAAATCCATTCCGAGTTCCTCGAGCAGCCATATTAATGAAAGGAACAGCTGCTAT 
ACTTGAATATTTGAGAGACCGAATTCCTACTTAACATGGAGTTGCTTTATAACCCTTGTCATG 


T ATT ATT AACCCTGGTT AATT CT AAG G AGG ATGT AAC ATTTG TTTT AGTATC AT C TT AAAAGG 


TGATTATTGTAATTGATCTTGTAGTTTCCCAGTATCACCTACCCGTTGGTATAATTAGCCTGG 


GCCATATTCACTGCCAGTAAATATTTTTACATTTTTATTTAAGATTTTTGTAAGGTGTTGTGT 


ACATTTGTAATGGTGATAACCACAATGTGTTCATACATTTGTTCTAAATGTTTTGCTTATGAT 


TTATCCTGCTAACTTTCATTTTCTTATAGCAAGCAGTTTTTTCAAAAATGAATTTTTATTTAA 


TGTGGTTCAGTATTATAATAACAAAGCATTTTTGTAGAACTGGTTTTTTTTCTCATTTATTTT 


TGTATTCCATACAATGTGACCAATTGACTTGAATATGACTAGCCAGTTTCTATGTTTTTGTTA 


GATATAAAATTAAATCGAATTTTGTTGAATACTGTTCTTTGGCATTTAAAAAATAAGACCTTC 


TTATCTTGGGCCACATGTCAAAAGAAAAAGGAAACAAAAATATATTAAAAATAAGACTTTTCA 


TTACCCATGATAGGACTTTTGTGATATGGCTAATCTCAGTACACATTTCAACTTAAAACCTTT 




T T ATTTAC AGC AC C AT AATTT T AAAATT T AC T TG C AAT CTTG GT AAG ACT AAAC TTG C AGT G T 


TTTTCTAAAAGGGAATTTGATAGGTAAACTTGATTTAATAAAAATTAAATATCATTTTTGTTT 


ACACCAAAATTATCAGAAGTAGGTTGATTAGTCATTATAACACTTACCATATGATTCTATTAA 


GAAGTCAATTCAGTAGCATGTATATCAATTTATATAGATAGGTAGATAGCTTTTGGATGATTG 


AGG C ATG CTT AT ATT ATG AAAAAAATTG CT AAT AAAG AT AAAT AC TAC AT GT TC AG AAT AAAA 


GTTACATTTTTC 




ORF Start: ATG at 53 j ORF Stop: TAA at 1 859 




SEQ ID NO: 170 602 aa MW at 68248.9kD 


NOV44a, 

CGI 24672-01 Protein 
Sequence 


MSRLKRIAGQDLRAGFKAGGRDCGTSVPQGLLKAARKSGQLNLSGRNLSEVPQCVWRI^^VDIP 
EEANQNLSFGATERWWEQTDLTKLIISNNKLQSLTDDLRLLPALTVLDIHDNQLTSLPSAIRE 
LENLQKLNVSHNKLKILPEEITNLRNLKCLYLQHNELTCISEGFEQLSNLEDLDLSNNHLTTV 
PASFSSLSSLVRLNLSSNELKSLPAEINRMKRLKHLDCNSNLLETIPPELAGMESLELLYLRR 
NKLRFLPEFPSCSLLKELHVGENQIEMLEAEHLKHLNSILVLDLRDNKLKSVPDEI ILLRSLE 
RLDLSNNDISSLPYSLGNLHLKFLALEGNPLRTIRREIISKGTQEVLKYLRSKIKDDGPSQSE 
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1 


SATETAMTLPSESRVNIHAI ITLKILDYSDKQATLIPDEVFDAVKSNIVTSINFSKNQLCEIP 
KRMVELKEMVSDVDLSFNKLSFISLELCVLQKLTFLDLRNNFLNSLPEEMESLVRLQTINLSF 
NRFKMLPEVLYRIFTLETILISNNQVGSVDPQKMKMMENLTTLDLQNNDLLQIPPELGNCVNL 
RTLLLDGNPFRVPRAAILMKGTAAILEYLRDRIPT 


t 


SEQIDNO:171 ,12712 bp 


:NOV44b, 

jCG 124672-03 DNA 
'Sequence 

i 
1 

i 

l 

i 

i 
i 

\ 

i 
i 

i 

{ 

i 

j 
i 

j 
i 

i 

! 
1 

i 

i 
i 


ACCTTTAAGCGTCACGGGTGGGGCTGCAGCTTCTGGACCTAGGACTTTGAACATGTCGCGCCT 


GAAGCGGATAGCGGGGCAGGATCTCCGCGCTGGTTTCAAAGCAGGTGGAAGAGACTGCGGTAC 
CTCGGT ACCC CAAGGGCTGTTG AAGG C AG CG AGG AAG AG CGG CC AGTT AAACC TGTCGGGT AG 
AAACCTCAGTGAAGTGCCGCAGTGTGTCTGGAGAATAAATGTGGATATCCCTGAGGAAGCTAA 
TCAGAATCTTTCGTTTGGTGCTACTGAAAGATGGTGGGAGCAGACAGATTTGACCAAACTAAT 
AATATCAAACAATAAACTTCAGTCACTTACAGATGACCTGCGACTCTTGCCTGCACTGACTGT 
TCTTGATATACATGATAATCAGTTGACATCCCTTCCTTCTGCTATAAGAGAGCTAGAAAATCT 
TCAGAAACTTAATGTCAGCCATAATAAACTGAAAATACTCCCTGAAGAAATTACAAACCTAAG 
AAACCTGAAGTGCCTGTATCTCCAGCATAATGAATTAACCTGCATATCAGAGGGATTTGAACA 
ACTTTCCAATTTAGAAGATTTAGATCTTTCAAACAATCATCTTACAACTGTTCCTGCTAGTTT 
TTCTTCTCTGTCCAGTCTGGTGCGACTCAATCTTTCTAGTAATGAACTGAAGAGTTTGCCAGC 
AGAAATAAATAGAATGAAAAGGTTGAAGCATTTGGATTGTAATTCAAATCTCTTGGAAACTAT 
ACCTCCTGAATTGGCTGGCATGGAATCACTAGAATTGCTTTATTTGCGGAGGAATAAATTACG 
TTTTCTACCAGAATTTCCTTCTTGTAGTCTATTGAAGGAATTGCACGTAGGTGAAAACCAGAT 
TGAAATGTTAGAGGCAGAACATCTTAAACATCTGAATTCAATTCTTGTGCTAGACCTGAGGGA 
TAACAAGTTAAAATCTGTTCCAGATGAAATTATACTACTACGGTCCTTGGAAAGGCTTGACCT 
AAGCAACAATGATATTAGTAGTCTTCCCTATTCATTGGGGAACCTTCATTTGAAATTTTTGGC 
ATTAGAAGGAAATCCTTTGAGAACAATTCGAAGAGAAATTATAAGTAAAGGAACACAAGAAGT 
CCTAAAATATCTACGAAGCAAGATCAAAGATGATGGACCTAGCCAAAGTGAGTCTGCTACTGA 
GACTGCCATGACACTACCAAGTGAATCCAGAGTCAATATACATGCCATCATTACATTAAAAAT 
ATTAGACTATAGTGATAAACAAGCAACTTTGATTCCTGATGAGGTGTTTGATGCAGT/ >AAAG 
CAACATCGTCACTTCTATTAACTTCAGTAAGAATCAACTATGTGAAATTCCAAAAAGGATGGT 
AGAACTGAAGGAAATGGTTTCTGATGTCGATCTCAGTTTTAATAAACTTTCCTTTATATCCTT 
GGAGTTATGTGTGCTTCAGAAATTGACTTTTTTAGATCTCAGGAACAATTTTTTAAATTCTTT 
GCCAGAAGAAATGGAATCACTGGTAAGACTGCAAACGATCAATCTTTCCTTTAATAGGTTTAA 
AATGCTACCTGAAGTTCTATATCGTATCTTCACACTTGAAACAATTCTGATTAGTAATAATCA 
GGTTGGATCTGTGGACCCTCGAGCAGCCATATTAATGAAAGGAACAGCTGCTATACTTGAATA 
TTTGAGAGACCGAATTCCTACTTAACATGGAGTTGCTTTATAACCCTTGTCATGTATTATTAA 


CCCTGGTTAATTCTAAGGAGGATGTAACATTTGTTTTAGTATCATCTTAAAAGGTGATTATTG 


TAATTGATCTTGTAGTTTCCCAGTATCACCTACCCGTTGGTATAATTAGCCTGGGCCATATTC 


ACTGCCAGTAAATATTTTTACATTTTTATTTAAGATTTTTGTAAGGTGTTGTGTACATTTGTA 


AT GGTG AT AAC CAC AATGTGT TC AT A C ATT TGTT CT AAATGT TTTG C TT ATG ATTT AT C CT G C 


TAACTTTCATTTTCTTATAGCAAGCAGTTTTTTCAAAAATGAATTTTTATTTAATGTGGTTCA 


GTATTATAATAACAAAGCATTTTTGTAGAACTGGTTTTTTTTCTCATTTATTTTTGTATTCCA 


TACAATGTGACCAATTGACTTGAATATGACTAGCCAGTTTCTATGTTTTTGTTAGATATAAAA 


TTAAATCGAATTTTGTTGAATACTGTTCTTTGGCATTTAAAAAATAAGACCTTCTTATCTTGG 


GCCACATGTCAAAAGAAAAAGGAAACAAAAATATATTAAAAATAAGACTTTTCATTACCCATG 


ATAGGACTTTTGTGATATGGCTAATCTCAGTACACATTTCAACTTAAAACCTTTTTATTTACA 


GCACCATAATTTTAAAATTTACTTGCAATCTTGGTAAGACTAAACTTGCAGTGTTTTTCTAAA 


AGGGAATTTGATAGGTAAACTTGATTTAATAAAAATTAAATATCATTTTTGTTTACACCAAAA 


TTATCAGAAGTAGGTTGATTAGTCATTATAACACTTACCATATGATTCTATTAAGAAGTCAAT 


TCAGTAGCATGTATATCAATTTATATAGATAGGTAGATAGCTTTTGGATGATTGAGGCATGCT 


j 


TATATTATGAAAAAAATTGCTAATAAAGATAAATACTACATGTTCAGAATAAAAGTTACATTT 


TTC 




ORF Start: ATG at 53 ] |ORF Stop: TAA at 1724 


j b 

: 
i 


SEQIDNO:172 |557 aa |MW at 631 l4.9kD 


jCG 124672-03 Protein 
'Sequence 

i 

5 

1 
1 


MSRLKRIAGQDLRAGFKAGGRDCGTSVPQGLLKAARKSGQLNLSGRNLSEVPQCVWRINVDIP 
EEANQNLSFGATERWWEQTDLTKLIISNNKLQSLTDDLRLLPALTVLDIHDNQLTSLPSAIRE 
LENLQKLNVSHNKLKILPEEITNLRNLKCLYLQHNELTCISEGFEQLSNLEDLDLSNNHLTTV 
PASFSSLSSLVRLNLSSNELKSLPAEINRMKRLKHLDCNSNLLETIPPELAGMESLELLYLRR 
NKLRFLPEFPSCSLLKELHVGENQIEMLEAEHLKHLNSILVLDLRDNKLKSVPDEIILLRSLE 
RLDLSNNDISSLPYSLGNLHLKFLALEGNPLRTIRREIISKGTQEVLKYLRSKIKDDGPSQSE 
SATETAMTLPSESRVNIHAI ITLKILDYSDKQATLIPDEVFDAVKSNIVTSINFSKNQLCE IP 
KRMVELKEMVSDVDLSFNKLSFISLELCVLQKLTFLDLRNNFLNSLPEEMESLVRLQTINLSF 
NRFKMLPEVLYRIFTLETILISNNQVGSVDPRAAILMKGTAAILEYLRDRIPT 




SEQ ID NO: 173 . 982 bp 


NOV44c, 


CTGCAGCTTCTGGACCTAGGACTTTGAACATGTCGCGCCTGAAGCGGATAGCGGGGCAGGATC 


CGI 24672-02 DNA 


TCCGCGCTGGTTTCAAAGCAGGTGGAAGAGACTGCGGTACCTCGGTACCCCAAGGGCTGTTGA 
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Sequence 


AGGCAGCGAGGAAGAGCGGCCAGTTAAACCTGTCGGGTAGAAACCTCAGTGAAGTGCCGCAGT j 
G TG TCTG G AG AAT AAATG TGG AT ATC C C TG AGG AAG C T AATC AG AAT CTTTCGTTTGG TG C T A j 
CTGAAAGATGGTGGGAGCAGACAGATTTGACCAAACTAATAATATCAAACAATAAACTTCAGT j 
CACTTACAGATGACCTGCGACTCTTGCCTGCACTGACTGTTCTTGATATACATGATAATCAGT| 
TGACATCCCTTCCTTCTGCTATAAGAGAGCTAGAAAATCTTCAGAAACTTAATGTCAGCCATA j 
ATAAACTGAAAATACTCCCTGAAGAAATTACAAACCTAAGAAACCTGAAGTGCCTGTATCTCC } 
AGCATAATGAATTAACCTGCATATCAGAGGGATTTGAACAACTTTCCAATTTAGAAGATTTAG 
ATCCTTCAAACAATCATCTTACAACTGTTCCTGCTAGTTTTTCTTCTCTGTCCAGTCTGGTAA 
GACTGCAAACGATCAATCTTTCCTTTAATAGGTTTAAAATGCTACCCGAAGTTCTATATCGTA 
TC TTC AC AC TTG AAAC AATTC TG ATT AGT AAT AATC AGGTTGG ATCTGTG G ACCCTC AG AAAA 
TGAAGATGATGGAAAATCTGACCACGTTGGACCTTCAAAATAATGACCTCTTACAAATTCCAC 
CAGAGCTCGGTAATTGTGTAAACTTAAGAACATTACTACTGGATGGAAATCCATTCCGAGTTC 
CTCG AG CAG CCAT AT T AATG AAAGG AAC AGC TGC T AT ACTTG AAT ATTTG AG AG ACCG AATT C 
CTACTTAACATGGAGTTGCTTTATAACCCTTGTCATG 




ORF Start: ATG at 30 


jORF Stop: TAA at 95 1 




SEQ ID NO: 174 


307 aa 


MWat 34561. 3kD 


NOV44c, 

CGI 24672-02 Protein 
Sequence 


MSRLKRIAGQDLRAGFKAGGRDCGTSVPQGLLKAARKSGQLNLSGRNLSEVPQCVWRINVDIP 
EEANQNLSFGATERWWEQTDLTKLIISNNKLQSLTDDLRLLPALTVLDIHDNQLTSLPSAIRE 
LEmiQKLNVSHNKLKILPEEITNLRNLKCLYLQHNELTCISEGFEQLSNLEDLDPSNNHLTTV 
PASFSSLSSLVRLQTINLSFNRFKMLPEVLYRIFTLETILISNNQVGSVDPQKMKMMENLTTL 
DLQNNDLLQI PPELGNCVNLRTLLLDGNPFRVPRAAI LMKGTAAI LEYLRDRI PT 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 44B. 



Table 44B. Comparison of NOV44a against NOV44b and NOV44c. 


Protein Sequence 


NOV44a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV44b 


1 ..602 
1..557 


525/602 (87%) 
525/602 (87%) 


NOV44c 


1..298 
I..303 


225/303 (74%) 
244/303 (80%) 



5 

Further analysis of the NOV44a protein yielded the following properties shown in 
Table 44C. 



Table 44C. Protein Sequence Properties NOV44a 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



1 0 A search of the NOV44a protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 44D. 
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Table 44D. Geneseq Results for NOV44a 



5 



v — . 

t 

Geneseq 
: Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV44a 
Residues/ 
Match 
Residues 


■ ■ 

Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


IABB97457 

! 


Novel human protein SEQ ID NO: 
725 - Homo sapiens, 602 aa. 
[WO200222660-A2, 21-MAR-2002] 


1..602 
1..602 


602/602(100%) 
602/602(100%) 


0.0 


! AAB95223 


Human protein sequence SEQ ID 
NO: 1 7348 - Homo sapiens, 602 aa. 
[EP 107461 7- A2, 07-FEB-2001] 


I. .602 
1..602 


602/602(100%) 
602/602(100%) 


0.0 


jAAB73695 


Human Ras-binding protein 66 - 
Homo sapiens, 602 aa. 
[WO200138367-AI, 31-MAY-2001] 


1..602 
1..602 


602/602(100%) 
602/602(100%) 


0.0 


i AAU20409 


Human secreted protein, Seq ID No 
401 - Homo sapiens, 215 aa. 
[WO200I55326-A2, 02-AUG-2001] 


131.345 
I ..215 


215/215(100%) 
215/215(100%) 


e-1 18 


i ABB03069 


Human expressed polypeptide SEQ ID 

NO 42 - Homo sapiens, 2 1 5 aa. 

[ WO200 1 55 1 67-A 1 , 02-AUG-200 1 ] 


131. .345 
1..215 


215/215(100%) 
215/215(100%) 


e-I 18 

i 




In a BLAST search of public sequence datbases, the NOV44a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 44E. 


J Table 44E. Public BLASTP Results for NOV44a 


; Protein 
j Accession 
Number 


Protcin/Organism/Length 


NOV44a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


j Q9H9A6 


CDNA FLJ 12889 fis, clone 
NT2RP2004098, weakly similar to 
adenylate cyclase (EC 4.6.1.1) 
(Unknown) (Protein for MGC: 16864) - 
Homo sapiens (Human), 602 aa. 


1..602 
1..602 


1602/602(100%) 
1602/602 (100%) 


0.0 


• Q9HCZ4 


DJ677HI5.1 (A novel protein similar to 
leucine-rich repeat proteins) - Homo 
sapiens (Human), 601 aa. 


I..602 
1..601 


601/602 (99%) 
601/602 (99%) 


0.0 

• 


Q9BTR7 


Hypothetical 43.9 kDa protein - Homo 
sapiens (Human), 384 aa. 


2I9..602 
1..384 


384/384 (100%) 
384/384 (100%) 


0.0 


Q9NXC1 


CDNA FLJ20331 fis, clone HEP10410 - 
Homo sapiens (Human), 384 aa. 


219..602 
I. .384 


383/384 (99%) 
383/384 (99%) 


0.0 
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Q9CRC8 



2610040EI6Rik protein (Similar to 
hypothetical protein FLJ2033I) - Mus 
inusculus (Mouse), 384 aa. 



2I9..601 
1..383 



321/383 (83%) 
355/383 (91%) 



10.0 



PFam analysis predicts that the NOV44a protein contains the domains shown in the 
Table 44F. 



Table 44F. Domain Analysis of NOV44a 


Pfam Domain 


NOV44a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


LRR 


106.. 128 1 


9/25 (36%) 
20/25 (80%) 


0.0097 


j LRR 


129..15I 


1 1/25 (44%) 
zv/Zj {oV/o) 


0.005 


LRR 


152.. 174 


8/25 (32%) 
20/25 (80%) 


0.016 


LRR 


175.. 197 


14/25 (56%) 
22/25 (88%) 


2.4e-05 


LRR 


I98..220 


10/25 (40%) 
19/25 (76%) 


0.013 


LRR 


221. .243 


8/25 (32%) 
20/25 (80%) 


0.093 


LRR 


244..265 


9/25 (36%) 
19/25(76%) 


0.36 


LRR 


313..335 


16/25(64%) 
21/25 (84%) 


0.00016 


LRR 


473..495 


12/25 (48%) 
19/25 (76%) 


0.05 


LRR 


543..565 


13/25 (52%) 
20/25 (80%) 


0.0012 



5 

Example 45. 

The NOV45 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 45A. 



Table 45A. NOV45 Sequence Analysis 




SEQ ID NO: 175 


2797 bp 




NOV45a, 

CGI 25900-01 DNA 


ATGGCCGGCACGCGCTGGGTACTCGGGGCGCTGCTCCGGGGCTGCGGCTGTAACTGCAGCAGC 
TGCCGGCGCACCGGCGCCGCCTGCCTGCCCTTCTACTCCGCCGCTGGCTCTATCCCGTCGGGC 
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Sequence 


GTCTCGGGCCGCCGCCGCCTGCTGCTGCTGCTCGGGGCCGCCGCGGCCGCTGCCTCCCAGACG 
CGTGGCCTCCAGACCGGGCCTGTGCCTCCCGGGAGGCTGGCGGGGCCTCCCGCTGTGGCCACC 
TCTGCCGCGGCCGCGGCCGCCGCGTCCTACCCTGCCCTCCGTGCCTCTCTGCTGCCGCAGTCG 
CTGGCGGCGGCGGCCGCCGTCCCGACGCGCAGCTACAGCCAGGAGTCCAAAACTACTTACCTG 
GAAGACCTTCCACCACCCCCTGAGTATGAATTGGCCCCGTCCAAGTTAGAAGAGGAAGTGGAT 
GATGTCTTTCTCATTCGAGCTCAAGGACTGCCCTGGTCATGCACTATGGAAGATGTGCTTAAC 
TTTTTTTCAGACTGCAGAATCCGCAACGGTGAGAATGGAATACATTTTCTCCTAAACAGAGAT 
GGG AAACG AAGGGGTG ATG CC TT AATTG AAATGG AG T C AG AGC AGG ATGTGC AG AAAG C C T T A 
GAGAAGCACCGCATGTACATGGGCCAGCGGTATGTGGAAGTATATGAGATAAACAATGAAGAT 
GTGG ATG C CTTAATG AAG AGC TTGC AG GTC AAATCTTCG CCTGTG GT AAATG ATGGTGTGGTT 
CGTTTGAGAGGACTTCCTTATAGTTGCAATGAGAAAGACATTGTAGACTTCTTTGCAGGACTG 
AATATAGTTGACATTACTTTTGTGATGGACTATAGAGGGAGGCGAAAAACAGGGGAAGCCTAT 
GTGCAATTTG AAG AACCAG AAATGG CCAACC AAG CCCTGTTGAAACACAGGG AAG AAATTGGT 
AATCGATACATCGAGATATTTCCAAGCAGAAGGAATGAAGTTCGAACACATGTCGGTTCTTAT 
AAGGGAAAGAAAATCGCATCTTTTCCTACTGCTAAGTATATAACTGAGCCAGAAATGGTCTTT 
G AAG AAC ATG AAGT AAATG AGGTATTT CAAC C C A TG A C AGCTTTTG AAAGTG AG AAG G AAA T A 
GAATTGCCTAAGGAGGTGCCAGAAAAGCTTCCAGAGGCTGCTGATTTTGGAACTACGTCTTCT 
CTG CATTTTGTCC A C ATG AG AGG ATT ACC TTT CC AAG CC AATG CC C AAG ACATT AT AAACT TT 
TTTGCTCCACTCAAGCCTGTTAGAATCACCATGGAATACAGCTCCAGTGGGAAGGCCACTGGA 
GAAGCTGATGTGCACTTTGAGACCCATGAGGATGCTGTTGCAGCGATGCTCAAGGATCGGTCC 
CACGTTCATCATAGGTATATTGAACTGTTCCTGAATTCATGTCCAAAAGGAAAATAAGACTCT 
AGGGGCTCCAGATAATAAGGGTGAAGCAAGAAGCATTTCATTTGCACATCTTTCTTGGACTTG 


GGATATACAGTTCCAGTTTATTAGCAGCAACTGCTAGGGAAATGATTTTGGTGTTTTGGGTTA 


ATTGCTTCTAAGAAAAGTTTCATAGTGGACTGTTTAGAAGAAGAAATGAAAGATCCAGTTTGG 


GAT T ATG AAAT AAACC AC AAATT AAAATTTT TGT TT AAACTGT CC AGGAT C TG ATTT AAAAAT 


ATGGT CTTTGTTTT AT ATG AT T AAATGGTTTGTTTTC AT AGATGATATGTT ACT C AT TG T AAA 


GACCACATATTTTTATTCAGCAGTGTTCTTTAAACGGTTTCATTTAAAAAGTAACTTTTTTTT 


TT TG C CTGTG AATTG AG TG CT C TG AT GT AAAACT TCT C ATGG AGTG AAAC AGT G AT T T ATT TT 


AACCAAACATTCACCAAAGCAAAGAACGGTTTCAGACCTTTGAACTGGTATGGTTTGGCAGAA 


TAGTT TTAAATTTTG CTGT AT TTG ATT ACTT AG AG AT AGG AATTT TT AAAAATC AAAAC AAAA 


AATACCACAGCTTAGTGTAAATGACAATTTGGCGGTTTTATGTCTTTAGAAATGTTTTGCCTT 


TCTAAGCCTTGTGCTAAAGGCGTATAACGGTGGTGCCTATCTACTTAAGGGGGCATTCTAGTC 


TTAACTTAAAAGTTGTCTAAACTGTCCCTCCCTGGCTTTTTTTGGTTTGGGGTAGACCTAAGG 


GTGTTTGTTAGTCTCAAAACTGTGAAGTGACATGTCAGAACAGTCCAGACTGGTAAGAAAATT 


AATGGCTTCACTTGAATTTAAACCAGCTCTAGATAGGAAAAAAATCAGTCTCCTCATTTGCTT 


TTTAAATGGAGTAGTACATCCCATATTTTAGAACAAGTAGGGGTGCCTTGCTTAAATAAAAAT 


AGCATTTAATGTATAATTGTGTGAAGGGTTTATGGATAAAGCTGTACTTCTGTCACAATGTGG 


CAGTACTTTCTGCTTTAATATTAAACAGCTTGTTATTTAAATATTGGACAAAATGGCTGGCTT 


CAAAATATAGTCATTAATAAACTAACTTTATGTGCACCTGTGTAGGAGAATCAAAATCCTGTA 


TGCTTTCTTTGCCTTGTTCCTGTTCTCAGGGTGACGACTGCCACCAGGAGATGCAGTTCTAGT 


T C TT AAAATT AAATTTG CCC AGG TTTCTG AC AGG TG AT ACCTGG AAG AG AG ACT ATG T CTT CT 




CTTACTTAATACATAACCATCTTTGATTACCAGCTAAGATGCGAAATCACTGTACTfiTAC;TrA 


ATAAATGAAGACTTGTTTCAGGCTG 




ORF Start: ATG at 1 jORF Stop: TA A at 1 44 i 




SEQ ID NO: 176 480 aa |MW at 53153.8kD 


NOV45a, 

CGI 25900-0 1 Protein 
Sequence 


MAGTRWVLGALLRGCGCNCSSCRRTGAACLPFYSAAGS I PSGVSGRRRLLLLLGAAAAAASQT 
RGLQTGPVPPGRLAGPPAVATSAAAAAAASYPALRASLLPQSLAAAAAVPTRSYSQESKTTYL 
EDLPPPPEYELAPSKLEEEVDDVFLIRAQGLPWSCTMEDVLNFFSDCRIRNGENGIHFLLNRD 
GKRRGDALIEMESEQDVQKALEKHRMYMGQRYVEVYEINNEDVDALMKSLQVKSSPVVNDGVV 
RLRGLPYSCNEKDIVDFFAGLNI VDITFVMDYRGRRKTGEAYVQFEEPEMANQALLKHREEIG 
NRYIEIFPSRRNEVRTHVGSYKGKKIASFPTAKYITEPEMVFEEHEVNEVFQPMTAFESEKEI 
ELPKEVPEKLPEAADFGTTSSLHFVHMRGLPFQANAQDIINFFAPLKPVRITMEYSSSGKATG 
EADVHFETHEDAVAAMLKDRSHVHHRYIELFLNSCPKGK 



Further analysis of the NOV45a protein yielded the following properties shown in 
Table 45B. 



278 



WO 03/023002 W ^^PCT/US02/28539 



Table 45B. Protein Sequence Properties NOV45a 


PSort 
analysis: 


0.7929 probability located in mitochondrial intermembrane space; 0.7600 probability 
located in nucleus; 0.4699 probability located in mitochondrial matrix space; 0.3000 
probability located in microbody (peroxisome) 


SignalP 
analysis: 


Cleavage site between residues 62 and 63 



A search of the NOV45a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 45C. 

5 



Table 45C. Geneseq Results for NOV45a 


! 

i 

i Geneseq 
Identifier 


Protcin/Organism/Length [Patent #, 
Date) 


NOV45a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB92890 


Human protein sequence SEQ ID 
NO:l 1499- Homo sapiens, 415 aa. 
[EP 1 0746 1 7-A2, 07-FEB-200 1 ] 


142..475 
3. .363 


149/362 (41%) 
215/362 (59%) 


5e-71 


AAU84338 


Protein HNRPH2 differentially 
expressed in breast cancer tissue - 
Homo sapiens, 460 aa. [ WO2002 1 0436- 
A2, 07-FEB-2002) 


142..332 
14..205 


97/193 (50%) 
141/193 (72%). 


3e-5l 


ABB50269 


hNRP HI ovarian tumour marker 
protein, SEQ ID NO:26 - Homo 
sapiens, 449 aa. [WO2001 75 1 77-A2, 
ll-OCT-2001] 


142..332 
3. .194 


99/193 (51%) 
138/193 (71%) 


3e-51 


AAG00751 


Human secreted protein, SEQ ID NO: 
4832 - Homo sapiens, 308 aa. 
[EP1 03340 1-A2, 06-SEP-2000] 


142..332 
3. .194 


99/193 (51%) 
138/193 (71%) 


3e-5l 


ABG02074 


Novel human diagnostic protein #2065 - ; 
Homo sapiens, 479 aa. [WO200 175067- ; 
A2, ll-OCT-2001] 


142..332 
3.. 194 


98/193 (50%) 
136/193 (69%) 


2e-50 



In a BLAST search of public sequence datbases, the NOV45a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 45D. 



Table 45D. Public BLASTP Results for NOV45a 








Protein 

Accession 

Number 


Protein/Organism/Length 


NOV45a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 
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Q 12849 


G-rich sequence factor- 1 (GRSF-1) - 
Homo sapiens (Human), 424 aa. 


1..480 
1..424 


413/480 (86%) 
414/480 (86%) 


0.0 


BAC035I3 


CDNA FU33436 fis, clone 
BRACE2021478, highly similar to G- 
RICH SEQUENCE FACTOR-1 - 
Homo sapiens (Human), 4 1 5 aa. 


85..480 
20..415 


387/396 (97%) 
389/396 (97%) 


0.0 


S4808I 


GRSF-1 protein - human, 331 aa 
(fragment). 


150..480 
I..331 


331/331 (100%) 
331/331 (100%) 


0.0 


P70333 


Heterogeneous nuclear 
ribonucleoprotein H' (HnRNP H') 
(FTP-3) - Mus musculus (Mouse), 449 
aa. 


142..332 
3.. 194 


97/193(50%) 
140/193 (72%) 


7e-51 


035737 


Heterogeneous nuclear 
ribonucleoprotein H - Mus musculus 
(Mouse), 449 aa. 


142..332 
3.. 194 


99/193 (51%) 
138/193 (71%) 


7e-5l 



PFam analysis predicts that the NOV45a protein contains the domains shown in the 
Table 45E. v> 



Table 45E. Domain Analysis of NOV45a 


Pfam Domain 


NOV45a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


rrm 


152..224 


26/83 (31%) 
50/83 (60%) 


7.5e-05 


rrm 


252..321 


21/77 (27%) 
47/77(61%) 


le-ll 


rrm 


403 ..471 


1 8/77 (23%) 
44/77 (57%) 


0.017 
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Example 46. 

The NOV46 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 46A. 



Table 46A. NOV46 Sequence Analysis 




SEQ ID NO: 177 |2053 bp 




NOV46a> 

CGI 265 10-01 DNA 
Sequence 


CACCTGCTGCCCACCACCCCGGCAGCACCTTTCCCTGCCCAGGCTTCAGAGTGCCCTGTTGCT 


GCTGCCACTGCCCCCCACACTCCAGGGCCATGTCAGAGCTCCCATCTACCCTCCACCAGCATG 


CCGCTCCTGAAGATGCCCCCACCATTCTCGGGGTGCAGCCACCCCTGCAGCGGGCACTGTGGT 
GGGCACTGCAGTGGGCCTCTCCTCCCACCCCCGAGCTCTCAGCCACTCCCTAGCACTCACAGG 
GATCCCGGGTGCAAGGGGCACAAGTTTGCACACAGTGGCCTGGCTTGCCAGCTGCCCCAGCCC 
TG CG AGG C AG ATG AGGGG C TGGGTG AGG AAG AGG AT AGC AG C TC TG AGCG AAG CTCCTGCACC 
TC ATCCTCC AC CC ACC AG AG AG ATGGG AAGT TCTGTG ACTGCTG CT A CTG TG AGT TCT TCG G C 
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f ■'■ 

I 

i 
i 

i 
1 

i 
j 


CACAATGCGGAAAAGGAGAAGGCCCAGTTGGCAGCAGAAGCTCTAAAGCAGGCAAATCGTGTT 
TCTGGAAGCCGGGAGCCAAGGCCTGCCAGGGAGAGGCTCTTGGAGTGGCCCGACCGGGAACTG 
G ATCGGGTC AAC AGCTTC CTG AG CAG C CGTC TG C AGG AG ATC AAAAAC AC TGT C AAAG A C T C C 
ATCCGTGCCAGCTTCAGTGTGTGTGAGCTCAGCATGGACAGCAATGGCTTCTCTAAGGAGGGG 
GCTGCTGAGCCTGAGCCTCAGAGTCTACCCCCCTCAAACCTCAGTGGCTCCTCAGAGCAGCAG 
CCTGACATCAACCTTGACCTGTCCCCTTTGACTTTGGGCTCCCCTCAGAACCACACGTTACAA 
GCTCCAGGCGAGCCAGCCCCACCATGGGCAGAAATGAGAGGCCCCCACCCACCATGGACAGAG 
GTGAGGGGGCCCCCTCCCGGTATCGTCCCCGAGAACGGGCTCGTGAGGAGACTCAACACCGTG 
CCCAACCTATCCCGGGTGATCTGGGTCAAGACACCCAAGCCGGGCTACCCCAGCTCCGAGGAG 
CCAAGCTCAAAGGAAGTTCCCAGTTGCAAGCAGGAGCTGCCTGAGCCTGTGTCCTCAGGTGGG 
AAGCCACAGAAGGGCAAGAGGCAGGGCAGTCAGGCCAAGAAGAGCGAGGCAAGCCCAGCCCCC 
CGG CCC C C AG CC AGC CT AG AGGTTCC C AGTG CC AAGGGCC AGGTC G CTGG C C C C AAG C AG C C A 
GGCAGGGTCCTAGAGCTTCCCAAAGTAGGCAGCTGTGCTGAGGCTGGAGAGGGGAGCCGGGGG 
AGCCGGCCAGGACCAGGTTGGGCTGGCAGTCCCAAAACTGAGAAGGAGAAGGGCAGCTCCTGG 
CGAAACTGGCCAGGCGAGGCCAAGGCACGGCCTCAGGAGCAGGAGTCTGTGCAGCCCCCAGGC 
CCAGCAAGGCCACAGAGCTTGCCCCAGGGCAAGGGCCGCAGCCGCCGGAGCCGCAACAAGCAG 
GAGAAGCCAGCCTCCTCCTTGGACGATGTGTTCCTGCCCAAGGACATGGACGGGGTGGAGATG 
G ATG AGACTG ACCG AG AGGTGG AGT AC TTT AAG AGGT T CTGTTTG G ATTC TG C AAAG CAG ACT 
CGTCAGAAAGTTGCTGTGAACTGGACCAACTTCAGCCTCAAGAAAACCACTCCTAGCACAGCT 
CAGTGAGGCCCTGCCCAGGCTGAGCTGCTTCAGGGCATCCTGAGGCCCTGACTGCCAGCTGAA 


GGCGTATAATTTTTCCCTCCGTGTGCCCCACNTACCCGTCCAAGACCCTCTGTGCTCCCCACC 


ATCCTGGACCAACCAAAAGCTGAACGGATGCCACACTGTGCTGGGGCCCCTTGACCTCAGCAG 


AGCCGCTTCCTGGTGCTACGCAGCCTCCACACTCAGAGCCCGTGGACTGGGCTGGCCTAAGGG 


CCAGGGCTGATGGTACTGCTGGCCCAACACTGCTCTCTTTGTGTTTGGTTTTTTTGTTTTTGT 


TTTTATTTTGTTTTTTTCCAATTCTTTACTTTTGATACTGTGAAGATCTTTCGTGCCGAAAGA 


TAAAGCAACATTTGGACACAGAAAAAAAAAAAAAAAA 


5 
1 


ORF Start: ATG at 124 


jORF Stop: TGA at 1642 


! 


SEQ ID NO: 178 j506 aa 


MWat 54743.6kD 


!NOV46a, 

jCG 1265 10-01 Protein 

•Sequence 

i 

! 

i 

I 
j 


MPLLKMPPPFSGCSHPCSGHCGGHCSGPLLPPPSSQPLPSTHRDPGCKGHKFAHSGLACQLPQ 
PCEADEGLGEEEDSSSERSSCTSSSTHQRDGKFCDCCYCEFFGHNAEKEKAQLAAEALKQANR 
VSGSREPRPARERLLEWPDRELDRVNSFLSSRLQEIKNTVKDSIRASFSVCELSMDSNGFSKE 
GAAEPEPQSLPPSNLSGSSEQQPDINLDLSPLTLGSPQNHTLQAPGEPAPPWAEMRGPHPPWT 
EVRGPPPGIVPENGLVRRLNTVPNLSRVIWVKTPKPGYPSSEEPSSKEVPSCKQELPEPVSSG 
G K PQKG K RQG S Q AKK S E AS P APR P P AS L E V P S AKG Q V AG P KQ PG R VL E LP K VG S CAE AG EG S R 
GSRPGPGWAGSPKTEKEKGSSWRNWPGEAKARPQEQESVQPPGPARPQSLPQGKGRSRRSRNK 
QEKPASSLDDVFLPKDMDGVEMDETDREVEYFKRFCLDSAKQTRQKVAVNWTNFSLKKTTPST 
AQ 



Further analysis of the NOV46a protein yielded the following properties shown in 
Table 46B. 



j Table 46B. Protein Sequence Properties NOV46a 

J PSort j 0.4500 probability located in cytoplasm; 0.3000 probability located in microbody 
| analysis: j (peroxisome); 0.1000 probability located in mitochondrial matrix space; 0. 1000 
j probability located in lysosome (lumen) 

' SignalP J No Known Signal Sequence Predicted 
| analysis: j 

5 ! ' 

A search of the NOV46a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 46C. 
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f 

Table 46C. Geneseq Results for NOV46a 


Geneseq 
Identifier 


Protein/Organism/Length [Patents, 
Date] 


NOV46a 
Residues/ 
March 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB92785 


Human protein sequence SEQ ID 
NO: 1 1 276 - Homo sapiens, 448 aa. 
[EP1074617-A2, 07-FEB-2001] 


110.. 506 
52..448 


397/397(100%) 
397/397(100%) 


0.0 


ABB55776 


Human polypeptide SEQ ID NO 158 - 
Homo sapiens, 586 aa. 
[US2001039335-A1, 08-NOV-2001] 


110..506 
190..586 


396/397 (99%) 
396/397 (99%) 


0.0 


AAU39067 


Human secreted protein hbl041_2 - 

Homo sapiens, 586 aa. 

(WO2001 75068- A2, Il-OCT-2001] 


110..506 
190..586 


396/397 (99%) 
396/397 (99%) 


0.0 


AAY22498 


Human secreted protein sequence 
clone hb 1 04 1 2 - Homo sapiens, 586 
aa. [WO9938959-A1.05-AUG-1999] 


I10..506 
190..586 


396/397 (99%) 
396/397 (99%) 


0.0 


ABG08145 


Novel human diagnostic protein #8136 

- Homo sapiens, 830 aa. 

[WO200 1 75067- A2, 11 -OCT-2001] 


110..506 
419..816 


292/404 (72%) 
312/404 (76%) 


e-155 



In a BLAST search of public sequence datbases, the NOV46a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 46D. 



[Table 46D. Public BLASTP Results for NOV46a 








I 

| Protein 
Accession 
Number 


Protein/Organism/Length 


NOV46a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


j 

i Expect 

j Value 

i 


Q9NW00 


CDNA FLJ 10404 fis, clone 
NT2RM4000486 (Hypothetical 48.8 
kDa protein) - Homo sapiens 
(Human), 448 aa. 


110..506 
S2..448 


397/397(100%) 
397/397(100%) 


0.0 


Q96PV7 


KIAAI931 protein - Homo sapiens 
(Human), 5 14 aa (fragment). 


110..506 
118..514 


397/397(100%) 
397/397(100%) 


0.0 


Q8VCA1 


Similar to hypothetical protein 
(Hypothetical 48.1 kDa protein) - Mus 
musculus (Mouse), 443 aa. 


1 10 .506 
52..443 


325/398 (81%) 
350/398 (87%) 


0.0 


AAH25483 


Hypothetical protein - Mus musculus 
(Mouse), 294 aa. 


1 10.356 
S2..294 


198/248(79%) 
219/248(87%) 


c-106 


P78311 


mRNA, complete cds, clone:RES4- 
22A, - Homo sapiens (Human), 1224 
aa. 


25..501 
720.. 1220 


153/522 (29%) 
236/522 (44%) 


8e-38 
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PFam analysis predicts that the NOV46a protein contains the domains shown in the 
Table 46E. 



Table 46E. Domain Analysis of NOV46a 



Pfam Domain 



NOV46a Match Region 



Identities/ 
Similarities 

for the Matched Region 



Expect Value 



Example 47. 

The NOV47 clone was analyzed, and the nucleotide and encoded polypeptide * 
sequences are shown in Table 47A. 



Table 47 A. NOV47 Sequence Analysis 



NOV47a, 

CGI 27 106-0 1 DNA 
Sequence 



SEQIDNO: 179 



3296 bp 



CCGCCGTTTATTGTGGCCCCGACAGGCCGGGGTTACTGTGGCGACCACGAGAGCAGCTTTGGC 



GCTATGGAGGAGCCCGGGGCTACCCCTCAACCGTATTTGGGGCTGCTCCTGGAGGAGCTACGC 

AGGGTTAGGAGTCGGCTTTATGTGGGACGAGAGAAAAAGCTTGCTCTAATGCTTTCTGGACTA 

ATTG AAG AAAAAAGT AAACT ACT TG AAAAATTT AGCC TTGTTC AAAAAG AGT ATG AAGG C T AT 

GAAGTAGAGTCATCTTTAAAGGATGCCAGCTTTGAGAAGGAGGCAACAGAAGCACAAAGTTTG 

G AGG C AAC CTGTG AAAAG CTG AAC AGGT C C AAT TCTG AACTTGAGG ATG AAAT ACTCTGT C T A 

GAAAAAGAGTTAAAAGAAGAGAAATCCAAACATTCTGAACAAGATGAATTGATGGCGGATATT 

TCAAAAAGGATACAGTCTCTAGAAGATGAGTCAAAATCCCTCAAATCACAAGTAGCTGAAGCC 

AAAATGACCTTCAAGATATTTCAAATGAATGAAGAACGACTGAAGATAGCAATAAAAGATGCT 

TT G AATG AAAATT CT CAACTTC AGG AAAG C CAG AAAC AG CTT TTG C AAG AAG C TG AAG T AT GG 

AAAG AAC AAG TGAGTG AACTT AAT AAAC AG AAAG T AACATTTG AAG ACTC C AAAG T AC ATG C A 

G AAC AAGTTCTAAATG AT AAAG AAAG TC AC ATC AAG ACT CTG ACTGAACGCTTGTT AAAG ATG 

AAAG ATTGGG CTG CT ATG CTTGG AG AAG ACAT AACGG ATG ATG AT AACTTGG AAT T AG AAA TG 

AACAGTGAATCGGAAAATGGTGCTTACTTAGATAATCCTCCAAAAGGAGCTTTGAAGAAACTG 

ATTCATGCTGCTAAGTTAAATGCTTCTTTAAAAACCTTAGAAGGAGAAAGAAACCAAATTTAT 

ATTC AGT TGT CTG AAGTTG AT AAAAC AAAGG AAG AG C TT ACAG AG CAT AT T AAAAAT C TT C AG 

ACTGAACAAGCATCTTTGCAGTCAGAAAACACACATTTTGAAAATGAGAATCAGAAGCTTCAA 

CAGAAACTTAAAGTAATGACTGAATTATATCAAGAAAATGAAATGAAACTCCACAGGAAATTA 

ACAGTAGAGGAAAATTATCGGTTAGAGAAAGAAGAGAAACTTTCTAAAGTAGATGAAAAGATC 

AG CC ATG CCACTG AAG AG CTGG AG AC CT AT AG AAAG CG AGCC AAAG AT CT TG AAG AAG AATT G 

G AGAG AACTATTC ATTCTT ATC AAGGG C AG ATTA TT T C C C ATG AG AAAAAAG C AC ATG AT AAT 

TGGTTGGCAGCTCGG AATG CTG AAAG AAACCTC AATG ATTT AAGG AAAG AAAATGCTC AC AAC 

AGACAAAAATTAACTGAAACAGAGCTTAAATTTGAACTTTTAGAAAAAGATCCTTATGCACTC 

G ATG T TC C AAAT AC AGCATTTGGC AG AG AG C ATTCC C C AT ATGGT CC CTC ACC A T TG G G T TG G 

CCTTCATCTGAAACAAGAGCTTTTCTCTCTCCTCCAACTTTGTTGGAGGGTCCACTCACACTC 

TCACCTTTGCTTCCAGGGGGAGGAGGAAGAGGCTCACGAGGCCCAGGGAATCCTTTGGACCAT 

CAG AT TACC AATG AAAG AGG AG AATC AAG CTGTG AT AGGTT AACCG ATCCTC AT AGGG C T C T C 

TCTGACACTGGGTTTCTGTCACCTCCATGGGACCAGGACCGTAGGATGATGTTTCCTCCGCCA 

GG AC AATCAT ATCCT G ATTC AGCCC T T CCT CCAC AAAGG C AAG AC AG ATTTTGTT CT AATT CT 

GGTAGACTGTCTGGACCAGCAGAACTCAGAAGTTTTAATATGCCTTCTTTGGATAAAATGGAT 

GGGTCAATGCCTTCAGAAATGGAATCCAGTAGAAATGATACCAAAGATGATCTTGGTAATTTA 

AATGTG C CTG ATTC ATCTCT C CCTG C TG AAAATG AAG CC ACTGGC C C TGG CTTTG TT CCTC C A 

CCTCTTGCTCCAGTCAGAGGTCCATTGTTTCCAGTGGATGCAAGAGGCCCATTCTTGAGAAGA 

GGACCTCCTTTCCCCCCACCTCCTCCAGGAGCCATGTTTGGAGCTTCTCGAGATTATTTTCCA 

CCAGGGGATTTCCCAGGTCCACCACCTGCTCCATTTGCAATGAGAAATGTCTATCCACCGAGG 

GGTTTTCCTCCTTACCTTCCCCCAAGACCTGGATTTTTCCCCCCACCCCCACATTCTGAAGGT 

AGAAGTGAGTTCCCCTCAGGTTTGATTCCACCTTCAAATGAGCCTGCTACTGAACATCCAGAA 
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CCACAGCAAGAAACCTGACAATATTTTTGCTCTCTTCAAAAGTAATTTTGACTGATCTCATTT 


TCAGTTTAAGTAACTGCTGTTACTTAAGTGATTACACTTTTGCTCAAATTGAAGCTTAATGGA 


ATTATAATTCTCAGGATAGTATTTTGTAAATAAAGATGATTTAAATATGAATCTTATGAGTAA 


ATTATTTCAATTTTATTTTAGACGGTATAACTATTTCAATTTGATTAATCCACTATTATATAA 


ACAATAGTGGGAGTTTTATATATGTAATCTTGCAGGTGGGGAGGCTTTAAATTCTGAAGTCTG 


TGTCTTTATGCCAAGAACTGTATTTACTGTGGTTGTGGACAAATGTGAAAGTAACTTTATGCT 


TAAATAAATTATAGTTGATTTAAAGATTTGTTTGGCATTGATAATAATAAAATCAGTAGTTTT 


TCT AT AACT ATGG CTCT ATTAATTAACTT TT TT CC TTTT ACC AAT AACTTTG AGG TG C AAAAC 


TCAAACTTATGTGGGTCTTTTGTGTTCAATTATGTTATGACAAATGTGCTCTCTTTCTTGTAA 


AT AG AC ATGAGTGGC CC AAAG CAAC AAAT T AAT AC ACTTTT AAAAGTC AAAATT G ATT AT A T T 


TTAAAGATAACCAGGATATTATCTAATGGTGAATTGTAGAATTTTGATCTTCTTATTCACTGA 


GTTTCTTGCACGGTTTCTTTATTGCTTTTTTTCCCGCCTGTTCTTTTGTAAGGTATTTACTAT 


TTTCTGTGGAGGATATTGAGATGTACTACAGGATAACTGTAGTGAATGATGTGTCATCATTTT 


GAGCTTTGGACTCAATATCTTTAGTGTTTCCCTAAATCAGATTTGTAGGTCATGTTAAGCTTC 


TTGCACATTAATATGATTATGGAAGGAAAGGCAGTGAAGCATAACTAATAAACATCATAATAC 


TTAAAAAAAAAAAAAAAAAA 




ORF Start: ATG at 67 | jpRF Stop: TGA at 2347 




SEQIDNO:180 760 aa MW at 86012.4kD 


NOV47a, 

CGI 27 106-01 Protein 
Sequence 


MEEPGATPQPYLGLLLEELRRVRSRLYVGREKKLALMLSGLIEEKSKLLEKFSLVQKEYEGYE 
VESSLKDASFEKEATEAQSLEATCEKLNRSNSELEDEILCLEKELKEEKSKHSEQDELMADIS 
KRIQSLEDESKSLKSQVAEAKMTFKIFQMNEERLKIAIKDALNENSQLQESQKQLLQEAEVWK 
EQVSELNKQKVTFEDSKVHAEQVLNDKESHIKTLTERLLKMKDWAAMLGEDITDDDNLELEMN 
SESENGAYLDNPPKGALKKLIHAAKLNASLKTLEGERNQIYIQLSEVDKTKEELTEHIKNLQT 
EQASLQSENTHFENENQKLQQKLKVMTELYQENEMKLHRKLTVEENYRLEKEEKLSKVDEKIS 
HATEELETYRKRAKDLEEELERTIHSYQGQIISHEKKAHDNWLAARNAERNLNDLRKENAHNR 
QKLTETELKFELLEKDPYALDVPNTAFGREHSPYGPSPLGWPSSETRAFLSPPTLLEGPLTLS 
PLLPGGGGRGSRGPGNPLDHQITNERGESSCDRLTDPHRALSDTGFLSPPWDQDRRMMFPPPG 
QSYPDSALPPQRQDRFCSNSGRLSGPAELRSFNMPSLDKMDGSMPSEMESSRNDTKDDLGNLN 
VPDSSLPAENEATGPGFVPPPLAPVRGPLFPVDARGPFLRRGPPFPPPPPGAMFGASRDYFPP 
GDFPGPPPAPFAMRNVYPPRGFPPYLPPRPGFFPPPPHSEGRSEFPSGLIPPSNEPATEHPEP 
QQET 



Further analysis of the NOV47a protein yielded the following properties shown in 
Table 47B. 



Table 47B. Protein Sequence Properties NOV47a 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3000 probability located in microbocly 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



5 



A search of the NOV47a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 47C. 
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[Table 47C. Geneseq Results for NOV47a 



1 

j Geneseq 
i Identifier 

1 


! 

j 

i Protein/Organism/Length [Patent #, 
Date] 


NOV47a 
Residues/ 
Match 
Residues 


| Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


IABG0528O 

! 

J 
j 


Novel human diagnostic protein #5271 - 
Homo sapiens, 881 aa. [WO200 175067- 
;A2, ll-OCT-2001] 


1..759 
59.-867 


737/809 (91%) 
739/809 (91%) 


0.0 


ABG20258 

• 

i 


] Novel human diagnostic protein #20249 
j - Homo sapiens, 881 aa. 
[WO200175067-A2, ll-OCT-2001] 


1..759 
59..867 


732/809 (90%) 
736/809 (90%) 

i 


0.0 


AAY77574 


Human cytoskeletal protein (HCYT) 
(clone 3768043) - Homo sapiens, 806 
aa. [WO200006730-A2, 10-FEB-2000] 


1..760 
I..806 


715/806 (88%) 
734/806 (90%) 


0.0 


AAB70884 


Human CTAGE-2 protein - Homo 
sapiens, 754 aa. [WO200127255-A2, 
I9-APR-2001] 


I5..741 
29..7S4 


632/727 (86%) 
654/727 (89%) 


0.0 


AAM3085I 

\ 
\ 


Peptide #4888 encoded by probe for 
measuring placental gene expression - 
Homo sapiens, 777 aa. [WO200 157272- 
A2,09-AUG-2001) 


1..729 
I..775 


616/775 (79%) 
656/775 (84%) 


0.0 


In a BLAST search of public sequence datbases, the NOV47a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 47D. 


! Table 47D. Public BLASTP Results for NOV47a 


i 

\ Protein 
| Accession 
Number 


Protein/Organism/Length 


NOV47a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


01 5320 


Meningioma-expressed antigen 6/1 1 
(MEA6) (MEA1 1) - Homo sapiens 
(Human), 804 aa. 


I..760 
1 ..804 


758/804 (94%) 
759/804 (94%) 

j. 


0.0 


Q96SG9 
! i 


BA500G 1 0.2 (Movel protein similar to 
meningioma expressed antigen 6 
(MEA6) and 1 1 (MEA1 1))- Homo 
sapiens (Human), 825 aa (fragment). 


I..760 
15..8I6 


664/804 (82%) 
698/804 (86%) 


0.0 


Q96RT6 


CTAGE-2 - Homo sapiens (Human), 
754 aa. 


15. .741 
29..754 


632/727 (86%) 
654/727 (89%) | 


0.0 


AAH3I065 


Similar to Meningioma-expressed 
antigen 6/1 1 (MEA6) (MEAI 1) - Homo 
sapiens (Human), 745 aa. 


15..730 
29.-743 


628/716(87%) 
650/716(90%) 

r 


0.0 
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; 095046 


WUGSCH DJ0988G 15.3 protein 


I..729 


612/775 (78%) 


0.0 


i 


(DJI005HI1.2) 


I..775 


655/775 (83%) 




i 

; 


(WUGSC:H_DJ0988G15.3 protein) - 






i 


Homo sapiens (Human), 777 aa. 









PFam analysis predicts that the NOV47a protein contains the domains shown in the 
Table 47E. 



Table 47E. Domain Analysis of NOV47a 



Pfam Domain 


NOV47a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 





Example 48. 

The NOV48 clone was analyzed, and the nucleotide and encoded polypeptide ° 
sequences are shown in Table 48A. 



Table 48A. NOV48 Sequence Analysis 



|NOV48a, 

jCG 127340-0 1 DNA 
; Sequence 



SEQ ID NO: 181 



4797 bp 



CTGGCTGGCTGGCTGTACACCTCGCTGCCGCCGGGCTTCGGGGGCACCGGGGAAGACTACAGC 



GAAGAGGGGATCCACTTCCCGTGGCTGCCGGCGCTCGTGTGCACCGGCGCGGTATTCCTGGGC 



GCCGAAATCTGCCCCTACTCAGTGTGTTCGCGGCACGGGCTGGCCATCGCCTCGCACAGCGTG 



TGCCTGACCCGGCTTCTG ATGGCAGCCGCCTTCCCCGTGTGCTACCCGCTGGGCCGCCTGCTG 



GACTGGGCGCTGCGCCAGGAGATAAGCACCTTCTACACGCGGGAGAAGTTGCTGGAGACGTTG 
CGGGCCGCAGACCCCTACAGTGACCTGGTGAAGGAGGAGCTCAACATCATACAGGGTGCCCTG 
GAGCTGCGCACCAAAGTGGTGGAGGAGGTGCTGACCCCCCTGGGAGACTGCTTCATGCTGCGC 
TCAGACGCGGTGCTCGACTTCGCCACTGTCTCCGAGATCCTGCGCAGCGGCTACACTCGCATC 
CCAGTGTACGAGGGTGACCAGCGGCACAACATTGTGGACATTTTATTTGTCAAGGACTTGGCC 
TTCGTGGACCCCGACGACTGCACCCCGCTCCTCACTGTCACCCGCTTCTACAACCGGCCCCTG 
CATTGTGTTTTCAATGACACCCGACTGGACACGGTTCTGGAGGAGTTTAAGAAGGGAAAATCT 
CACCTGGCCATTGTCCAGCGGGTGAATAATGAGGGAGAAGGGGACCCTTTCTATGAGGTGATG 
GGCATTGTCACGCTGGAGGATATCATAGAGGAGATTATCAAGTCGGAGATCCTGGATGAAACT 
GATCTCTACACTGACAATCGGAAAAAGCAGAGGGTCCCGCAACGGGAGCGGAAGCGGCATGAC 
TTCTCCTTGTTTAAGCTTTCGGACACGGAGATGCGGGTGAAGATCTCACCACAGCTTCTGCTA 
GCCACACACCGCTTCATGGCCACAGAAGTGGAGCCCTTTAAGTCTCTGTACCTTTCGGAGAAG 
ATCCTGCTCCGGCTCCTGAAACATCCCAACGTGATCCAGGAGCTGAAGTTTGATGAGAAGAAC 
AAGAAGGCCCCGGAACACTACCTCTACCAGCGCAACCGCCCTGTGGACTACTTTGTGCTGCTT 
CTACAGGGTAAAGTGGAGGTGGAGGTTGGTAAGGAAGGCCTTCGCTTTGAAAATGGAGCCTTT 
ACTTACTATGGCGTCCCAGCCATCATGACCACTGCTTGCTCAGATAATGACGTGCGGAAGGTT 
GGAAGTCTGGCTGGATCTTCTGTCTTTCTAAACCGGTCCCCTTCTCGCTGCAGTGGGTTGAAT 
CGCTCTGAGTCTCCAAACCGAGAGCGCAGTGACTTTGGGGGCAGCAACACCCAGCTGTACAGC 
AGCAGCAACAACCTCTACATGCCTGACTACTCAGTCCACATCCTCAGCGATGTGCAGTTTGTG 
AAGATCACACGGCAGCAATATCAGAACGCACTCACTGCCTGCCACATGGACAGCTCACCTCAG 
TCCCCTGACATGGAGGCCTTCACAGACGGGGACTCCACTAAGGCCCCCACAACCCGGGGCACA 
CCCCAGACCCCTAAGGATGACCCCGCCATCACGCTCCTCAACAACAGGAACAGCCTGCCGTGC 
AGCCGCTCAGACGGGCTGAGAAGCCCCAGCGAGGTAGTGTACCTGAGGATGGAGGAGCTGGCC 
TTCACCCAGGAAGAAATGACTGACTTCGAGGAGCACAGCACACAGCAGCTCACGCTGTCTCCT 
GCAGCCGTTCCCACGAGAGCAGCATCAGATAGTGAATGTTGTAACATCAACCTGGATACAGAG 
ACCAGCCCCTGCAGTAGCGATTTTGAGGAAAACGTGGGCAAGAAGCTGCTGAGAACCTTGAGT 
GGCCAAAAAAGGAAGAGGTCACCAGAAGGAGAGAGAACCTCTGAGGACAACTCCAATTTAACA 
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jCCTCTGATCACATGA CAGGGCAAAGCCAGCATTCACTGGGTGTGTGAAATTCCAGAGCTTTGG 
GGGAGAATCCACCCTCCCATCATCTGCTTCCCCCAAGGCCTCCCACAGGTGACAGAATGTTCT 



GCCTTCCCTTCCATCTCTTCACCCCTAGCTGTCAGTTTGGCAGATTTTCCCTCGTTACCTCCA 
j GTTCGACTCAGAACCTTGACATGGCCATAACAGAAGGAGGTGCCTCTGATAGAACATGCTAGA 
jAATGGTCTTTTCCACAGCATAGTCTGGGACTGGAAAAGAGATGTCTGACTGCAAGCTGACAAT 



GCCACTCTGGGACCCCTGATGCTCTTCTTTGTTCTTTGGGTCCCCTGATGCCATAGGAGACCT 
ATCGTCTTGGAACTTGCCATTCTTTCCTCCAGAACAAAATGTTAACTTTCTAACACATTTCAT 



GCATAGCTTGGCTCAGAAGGTGCCATTGGCAGACAGGCACATGGGAGGCTGGAGTAGGAGGTC 



TGAAGATTAGTTCAGGGGATGGACCAAGAATTTCCCCCAGAGCTTTAAAGAAGTGGGACTCAG 



CC ATGTTGG CG CGTG ATTG AC ATTAC AG C AC AG AAAACTGTT AGTG ACTG GT T T C CTG T T A G A 



TAAGGGTTCCAGCAGCCTGGGGCAGTATGTCTCAGCTGGAATGGAAAGAATGTGAGATGGAAC 



CTC AAGTC AC TGTTTTT ACC AGGG AC AC AT CTGTTTTGG CTCCC AAT CAG C AG T C TTC AAT CG 



ATCAATAATTCTGCTCTGGAAGAGAAGGAACAGGGAGCAGAGAGACCCAACTGGGAGCCAGAG 



ATGGAACTTCAGGTCTTAAGTGCAAATCAAAGCAAAAAACAAACAAAACTTACATGGAAAAAC 



TGTAAGTGCTGAAAGCAAGTTTAGCCATGACAAACCAAAGAGTGCCCAGGTCAGCCAAGAAAG 



ATACATAATCTCATGGGACTTCAGTGGGAGTTACACAGGAATGTTGAAGAATCATTCTTCTTT 



TTCATGCATTTGTCCTTCTCCCACCCCCTTACTACACCCTAGCAGATCAGCTGAGTGTACTTT 



ATTCCAAGAACTTACTGGATCTCTGGTTTTTCTCCTGAAGTTGGGGCAGGTGCAATTCCAAGC 



ATAACCACCAGATGGCAGAGTGACCGCGCATACCTGCTTCCAAGAATAAAACAGTTCTGAAAA 



GCAACCGCAAAGCCGGGCGCGGTGGCTCACACCTGTAATCCCAACAGTTTGGAAGACCGAGGC 



GGGTGGATCACTTGAAGTCAGGAGTTTGAGACCAGCCTGGCCAACATGGTGAAACCCCCATCT 



CTCCTGGGCATGTAGTCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCTGGGA 



GGCAGAGGTTGCAGTGAGCTGAGATCACACCACTGCACTTCATCCTAGGCAACAGAGCGAGAC 



TCTGTCTCCCCCCTCAAAAAAAAAAAAGAAAGAAAGAAAGAGAAAAGAAAAGAAAAGCAACCA 



CAGCCAGCCTTAGGGAAAACTTGGAAGTAAGTGAAATTTGTCTTCAGAATAACTATCTCCCTT 



TCTGATCTGTCTCCTACTCTTTAGATGTTCTCAGTCAAGTACTCACTGAACTCATTGATCGAG 



TGCTGTCTGCTAAATCTCCAAACCATTCCCAAACCTTTCCCCGTAGTATACCATCCAGCTTCC 



CTCCCCTTCCTCCAAAACCCTCCCTCCCACCTCCCCACACCCATTGAGTCATTCACAGGCAGG 



AGGGAGACTGATCATTCCTCTGGGTTATCTGCATCTCAAAAGAAAATGCTTACCCACAGGAAC 



TGTTAACTCAGGGGTTCTTAACTTGGGGTCCATCACCCCAGGGGGTCCATAGTTGGACTTCAG 



AGGGT CT ATG AAACCCCT AAAAC TGT C AGT ATTT AATGT ATT T ATT C TTGT ATG TTTTTTCCA 



GAGCATTAAAGCTTTCATAAGGTTCTCAAAGGTCTCAGACCTACAAAGAGTTAAAACAAACAG 



ACAAACAAAAAAAAACACTACTATTTTAAATAGTGGAACTTTCAGCCCAGCGTTTCTGCAATG 



CAGAGTGAAGTGGATACTGGGCAGTTCGAGACAGGTTTTTAATCATAAGTGGTCTTTTCAAAT 



GTCCATCAATTGATGGGGAAGGCTGGCACCCACCAAGAAGTGGAAGTCCTCAGAAATTCTCGG 



CACACCCTAGAGTATTGTACAACCAACACCCCCACATAACTTTGTCCCCTCTTCCCCAACAAC 



CCAGAGCAGGTGTTGCAGACAGGAGGGCCACAGCGTGTGGAAGTAAAGACTTTGGAGCTAGAG 



ATGCCTTTTCCAGCAATGATTATTGACTTCACCACACCCCTTGCCTGGCCTGGCCTGAGGCTC 
AGCAGTGCATGACTTCTCGTAGATAACTTCACAGTCATCCAGTCCCAACACCTGCTCTTGCCT 



GGTAGGAACAGGCGAAGTGTCAGCCCTCAATGTTGGGTACTTAGACCCAAACCAATAAATGGT 



GAGTTTTGAACAAGAACTACCATCATGCAGGCTTCTTGCCCAGCTGACCACTGGCCCCGGGGT 



GCCTGCCTGGCTGGTCTTCATCACCTGAGGCCACCAGGCTCAAGCCACTGCTGTTGCATTACA 



CCCATCCCTTTGCAAAATCCCTATGGAGCCTGTCACCACTCCCCTCCCTATATACCCCCACCC 



CACAAAGATTTTCTTCAGGTTAAAAAAAAAGTTTAAAAAAAAGATTTTAAAATAAAGCATTTA 



TGAAGGCTTAATAAATTGTAAATAATTTTTAAATAAAATGAAAATGCCTTTCCTGGAAAAAAA 



AAAAAAAAA 



ORF Start: ATG at 208 



jORF Stop: TGA at 1966 



SEQ ID NO: 182 



586 aa 



MWat 66558.3kD 



NOV48a, 

CGI 27340-0 1 Protein 
Sequence 



MAAAFPVCYPLGRLLDWALRQE I STFYTREKLLETLRAADPYSDLVKEELN I I QG ALE LRTKV 
VEEVLTPLGDCFMLRSDAVLDFATVSEILRSGYTRIPVYEGDQRHNIVDILFVKDLAFVDPDD 
CTPLLTVTRFYNRPLHCVFNDTRLDTVLEEFKKGKSHLAIVQRVNNEGEGDPFYEVMGIVTLE 
DIIEEIIKSEILDETDLYTDNRKKQRVPQRERKRHDFSLFKLSDTEMRVKISPQLLLATHRFM 
ATEVEPFKSLYLSEKILLRLLKHPNVIQELKFDEKNKKAPEHYLYQRNRPVDY FVLLLQGKVE 
VEVGKEGLRFENGAFTYYGVPAIMTTACSDNDVRKVGSLAGSSVFLNRSPSRCSGLNRSESPN 
RERSDFGGSNTQLYSSSNNLYMPDYSVHILSDVQFVKITRQQYQNALTACHMDSSPQSPDMEA 
FTDGDSTKAPTTRGTPQTPKDDPAITLLNNRNSLPCSRSDGLRSPSEWYLRMEELAFTQEEM 
TDFEEHSTQQLTLSPAAVPTRAASDSECCNINLDTETSPCSSDFEENVGKKLLRTLSGQKRKR 
SPEGERTSEDNSNLTPLIT 



SEQ ID NO: 183 



NOV48b, 

CGI 27340-02 DNA 
Sequence 



J37H bp^ 



CTGGCTGGCTGGCTGTACACCTCGCTGCCGCCGGGCTTCGGGGGCACCGGGGAAGACTACAGC 



GAAGAGGGGATCCACTTCCCGTGGCTGCCGGCGCTCGTGTGCACCGGCGCGGTATTCCTGGGC 
GCCGAAATCTGCCCCTACTCAGTGTGTTCGCGGCACGGGCTGGCCATCGCCTCGCACAGCGTG 
TGCCTGACCCGGCTTCTG ATGGCAGCCGCCTTCCCCGTGTGCTACCCGCTGGGCCGCCTGCTG 
GACTGGGCGCTGCGCCAGGAGATAAGCACCTTCTACACGCGGGAGAAGTTGCTGGAGACGTTG 
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CGGG C CG C AG ACC C C T AC AG TG ACCTGGTG AAGG AGG AGCT C AAC ATC AT AC AGG GTG C C C TG 
GAGCTGCGCACCAAAGTTGTGGAGGAGGTGCTGGCCCCCCTGGGAGACTGCTTCATGCTGCGC 
TCAGACGCGGTGCTCGACTTCGCCACTGTCTCCGAGATCCTGCGCAGCGGCTACACTCGCATC 
CCAGTGTACGAGGGTGACCAGCGGCACAACATTGTGGACATTTTATTTGTCAAGGACTTGGCC 
TTCGTGGACCCCGACGACTGCACCCCGCTCCTCACTGTCACCCGCTTCTACAACCGGCCCCTG 
C ATTGTGTTTTC AAT G AC AC CCG ACTGG AC ACGG TTCTGGAGGAGTTTAAG AAGG CAT CAG AT 
AGTGAATGTTGTAACATCAACCTGGATACAGAGACCAGCCCCTGCAGTAGCGATTTTGAGGAA 
AACGTGGGCAAGAAGCTGCTGAGAACCTTGAGTGGCCAAAAAAGGAAGAGGTCACCAGAAGGA 
GAGAGAACCTCTGAGGACAACTCCAATTTAACACCTCTGATCACATG ACAGGGCAAAGCCAGC 
ATTCACTGGGTGTGTGAAATTCCAGAGCTTTGGGGGAGAATCCACCCTCCCATCATCTGCTTC 



CCCCAAGGCCTCCCACAGGTGACAGAATGTTCTGCCTTCCCTTCCATCTCTTCACCCCTAGCT 



GTC AGTTTGG C AG ATTTT CCCTC G TT AC C TC C AG TTCG ACTC AG AAC CTTG AC ATGG C CAT AA 



CAG AAGG AGGTG C CT CTG ATAG AACATG C T AG AAATGGT CTT TTC CAC AG C AT AG TCTGGG AC 



TGGAAAAGAGATGTCTGACTGCAAGCTGACAATGCCACTCTGGGACCCCTGATGCTCTTCTTT 



GTTCTTTGGGTCCCCTGATGCCATAGGAGACCTATCGTCTTGGAACTTGCCATTCTTTCCTCC 



AGAACAAAATGTTAACTTTCTAACACATTTCATGCATAGCTTGGCTCAGAAGGTGCCATTGGC 



AGACAGGCACATGGGAGGCTGGAGTAGGAGGTCTGAAGATTAGTTCAGGGGATGGACCAAGAA 



TTTCCCCCAGAGCTTTAAAGAAGTGGGACTCAGCCATGTTGGCGCGTGATTGACATTACAG C A 



CAGAAAACTGTTAGTGACTGGTTTCCTGTTAGATAAGGGTTCCAGCAGCCTGGGGCAGTATGT 



CTCAGCTGGAATGGAAAGAATGTGAGATGGAACCTCAAGTCACTGTTTTTACCAGGGACACAT 



CTGTTTTGGCTCCCAATCAGCAGTCTTCAATCGATCAATAATTCTGCTCTGGAAGAGAAGGAA 



CAGGGAGCAGAGAGACCCAACTGGGAGCCAGAGATGGAACTTCAGGTCTTAAGTGCAAATCAA 



AGCAAAAAACAAACAAAACTTACATGGAAAAACTGTAAGTGCTGAAAGCAAGTTTAGCCATGA 



CAAACCAAAGAGTGCCCAGGTCAGCCAAGAAAGATACATAATCTCATGGGACTTCAGTGGGAG 



TTACACAGGAATGTTGAAGAATCATTCTTCTTTTTCATGCATTTGTCCTTCTCCCACCCCCTT 



ACTACACCCTAGCAGATCAGCTGAGTGTACTTTATTCCAAGAACTTACTGGATCTCTGGTTTT 



TCTCCTGAAGTTGGGGCAGGTGCAATTCCAAGCATAACCACCAGATGGCAGAGTGACCGCGCA 



TACCTGCTTCCAAGAATAAAACAGTTCTGAAAAGCAACCGCAAAGCCGGGCGCGGTGGCTCAC 



ACCTGTAATCCCAACAGTTTGGAAGACCGAGGCGGGTGGATCACTTGAAGTCAGGAGTTTGAG 



ACCAGCCTGGCCAACATGGTGAAACCCCCATCTCTCCTGGGCATGTAGTCCCAGCTACTCGGG 



AGGCTGAGGCAGGAGAATCGCTTGAACCTGGGAGGCAGAGGTTGCAGTGAGCTGAGATCACAC 



CACTGCACTTCATCCTAGGCAACAGAGCGAGACTCTGTCTCCCCCCTCAAAAAAAAAAAAGAA 



AGAAAGAAAGAGAAAAGAAAAGAAAAGCAACCACAGCCAGCCTTAGGGAAAACTTGGAAGTAA 



GTGAAATTTGTCTTCAGAATAACTATCTCCCTTTCTGATCTGTCTCCTACTCTTTAGATGTTC 
TCAGTCAAGTACTCACTGAACTCATTGATCGAGTGCTGTCTGCTAAATCTCCAAACCATTCCC 



AAACCTTTCCCCGTAGTATACCATCCAGCTTCCCTCCCCTTCCTCCAAAACCCTCCCTCCCAC 



CTCCCCACACCCATTGAGTCATTCACAGGCAGGAGGGAGACTGATCATTCCTCTGGGTTATCT 



GCATCTCAAAAGAAAATGCTTACCCACAGGAACTGTTAACTCAGGGGTTCTTAACTTGGGGTC 



CATCACCCCAGGGGGTCCATAGTTGGACTTCAGAGGGTCTATGAAACCCCTAAAACTGTCAGT 



ATTTAATGTATTTATTCTTGTATGTTTTTTCCAGAGCATTAAAGCTTTCATAAGGTTCTCAAA 



GGTCTCAGACCTACAAAGAGTTAAAACAAACAGACAAACAAAAAAAAACACTACTATTTTAAA 



TAGTGGAACTTTCAGCCCAGCGTTTCTGCAATGCAGAGTGAAGTGGATACTGGGCAGTTCGAG 



ACAGGTTTTTAATCATAAGTGGTCTTTTCAAATGTCCATCAATTGATGGGGAAGGCTGGCACC 



CACCAAGAAGTGGAAGTCCTCAGAAATTCTCGGCACACCCTAGAGTATTGTACAACCAACACC 



CCCACATAACTTTGTCCCCTCTTCCCCAACAACCCAGAGCAGGTGTTGCAGACAGGAGGG CCA 



CAGCGTGTGGAAGTAAAGACTTTGGAGCTAGAGATGCCTTTTCCAGCAATGATTATTGACTTC 



ACCACACCCCTTGCCTGGCCTGGCCTGAGGCTCAGCAGTGCATGACTTCTCGTAGATAACTTC 



ACAGTCATCCAGTCCCAACACCTGCTCTTGCCTGGTAGGAACAGGCGAAGTGTCAGCCCTCAA 



TGTTGGGTACTTAGACCCAAACCAATAAATGGTGAGTTTTGAACAAGAACTACCATCATGCAG 



GCTTCTTGCCCAGCTGACCACTGGCCCCGGGGTGCCTGCCTGGCTGGTCTTCATCACCTGAGG 



CCACCAGGCTCAAGCCACTGCTGTTGCATTACACCCATCCCTTTGCAAAATCCCTATGGAGCC 



TGTCACCACTCCCCTCCCTATATACCCCCACCCCACAAAGATTTTCTTCAGGTTAAAAAAAAA 



GTTTAAAAAAAAG ATTTT AAAATAAAG C ATTT ATG AAGG CTT AATAAATTGT AAAT AAT T TTT 



AAATAAAATGAAAATGCCTTTCCTGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 



ORF Start: ATG at 208 



r 



ORF Stop: TGA at 865 



SEQ ID NO: 184 



219 aa JMW at 24860.9kD 



NOV48b, 

CG 127340-02 Protein 
Sequence 



MAAAFPVCYPLGRLLDWALRQEISTFYTREKLLETLRAADPYSDLVKEELNIIQGALELRTKV 
VEEVLAPLGDCFMLRSDAVLDFATVSEILRSGYTRI PVYEGDQRHNIVDILFVKDLAFVDPDD 
CTPLLTVTRFYNRPLHCVFNDTRLDTVLEEFKKASDSECCNINLDTETSPCSSDFEENVGKKL 
LRTLSGQKRKRS PEG ERTSEDNS NLT PL I T 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 48B. 
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Tabic 48B. Comparison of NOV48a against NOV48b. 


i 

'. Protein Sequence 

i 


NOV48a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


: NOV48b 

i 


I..227 
I..212 


171/227 (75%) 
180/227 (78%) 



Further analysis of the NOV48a protein yielded the following properties shown in 
Table 48C. 

5 



j Table 48C. Protein Sequence Properties NOV48a 


jPSort 

J analysis: 

i 
i 


0.6000 probability located in nucleus; 0.4644 probability located in mitochondrial 
matrix space; 0.3000 probability located in microbody (peroxisome); 0.1632 
probability located in mitochondrial inner membrane 


jSignalP 
j analysis: 


Cleavage site between residues 21 and 22 



A search of the NOV48a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 48D. 

10 



j Table 48D. Ccncscq Results for NOV48a 



r 

i 

Gcncscq 
• Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV48a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


j Expect 
Value 


AAUI6931 


Human novel secreted protein, SEQ ID 
1 72 - Homo sapiens, 377 aa. 
[WO2001 55441 -A2, 02-AUG-2001] 


211. .586 
2..377 


375/376 (99%) 
376/376 (99%) 


0.0 


AAB95271 


Human protein sequence SEQ ID 
NO: 1 7469 - Homo sapiens, 633 aa. 
[EP1074617-A2,07-FEB-200I] 


I..481 
141. .624 


321/492 (65%) 
376/492 (76%) 


e-173 


iAAB95413 


Human protein sequence SEQ ID 
NO: 1 7804 - Homo sapiens, 853 aa. 
[EP1074617-A2, 07-FEB-2001] 


1.481 
383.844 


318/487 (65%) 
371/487 (75%) 


e-172 


IAAE20847 


Human gene 18 encoded secreted 
protein fragment, SEQ ID NO: 109 - 
Homo sapiens, 466 aa. [WO2002 18435- 
AI.07-MAR-2002] 


1..475 
I..456 


288/482 (59%) 
355/482 (72%) 


e-157 


AAE20846 


Human gene 18 encoded secreted 


1..475 
I04..559 


288/482 (59%) 
355/482 (72%) 


e-157 
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Homo sapiens, 569 aa. [WO2002 ! 8435- 
A1,07-MAR-2002] 



In a BLAST search of public sequence datbases, the NOV48a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 48E. 



Table 48E. Public BLASTP Results for NOV48a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV48a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9NRU3 


Ancient conserved domain protein 1 - 
Homo sapiens (Human), 586 aa. 


1..586 
1..586 


586/586(100%) 
586/586(100%) 


0.0 


Q9JIQ6 


Ancient conserved domain protein 1 - 
Mus musculus (Mouse), 586 aa. 


I..586 
I..586 


551/586(94%) 
569/586 (97%) 


0.0 


Q9NRK5 


Ancient conserved domain protein 2 - 
Homo sapiens (Human), 633 aa. 


I..48I 
141. .624 


321/492(65%) 
376/492 (76%) 


e-173 


Q9H952 


CDNA FLJ 13004 fis, clone 
NT2RP3000439, weakly similar to 
hypothetical 46.4 kDa protein in FFH- 
GRPE intergenic region - Homo sapiens 
(Human), 633 aa. 


I..481 
141. .624 


321/492 (65%) 
376/492 (76%) 


e-173 


Q9J1M8 


Ancient conserved domain protein 2 - 
Mus musculus (Mouse), 693 aa. 


1..481 
201. .684 


320/492 (65%) 
375/492 (76%) 


e-172 



5 



PFam analysis predicts that the NOV48a protein contains the domains shown in the 
Table 48F. 

10 



i Table 48F. Domain Analysis of NOV48a 



! 




Identities/ 




Pfam Domain 


NOV48a Match Region 


Similarities 

for the Matched Region 


Expect Value 



Example 49. 

The NOV49 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 49A. 
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(Table 49A. NOV49 Sequence Analysis 




SEQ ID NO: 185 


|1 137 bp 




! 


NOV49a, 

CGI 283 10-01 DNA 

UC VI LI t- 1 I L^t 


ATGGACTCGCTGGCAGCGCCCCAGGACCGCCTGGTGGAGCAGCTGCTGTCGCCGCGGACCCAG 
GCCCAGAGGCGGCTCAAGGACATTGACAAGCAGTACGTGGGCTTCGCCACACTGCCCAACCAG 
GTGCACCGCAAGTCGGTGAAGAAAGGCTTTGACTTCACACTCATGGTGGCTGGTGAGTCAGGC 
CTGGGGAAGTCCACACTGGTCCACAGCCTCTTCCTGACAGACTTGTACAAGGACCGGAAGCTG 
CTCAGTGCTGAGGAGCGCATCAGCCAGACGGTAGAGATTCTAAAACACACGGTGGACATTGAG 
GAGAAGGGAGTCAAGCTGAAGCTCACCATCGTGGACACGCCGGGATTCGGGGACGCTGTCAAC 
AACACCGAGTGCTGGAAGCCCATCACCGACTATGTGGACCAGCAGTTTGAGCAGTACTTCCGT 
G ATG AG AGCGGCCTC AAC CG AAAG AAC ATC C AAG AC AAC CG AGTG C ACTG CTG CC T AT AC TTC 
ATCTCCCCCTTCGGGCATGGGCTGCGGCCAGTGGATGTGGGTTTCATGAAGGCATTGCATGAG 
AAGGTCAACATCGTGCCTCTCATCGCCAAAGCTGACTGTCTTGTCCCCAGTGAGATCCGGAAG 
CTGAAGGAGCGGATCCGGGAGGAGATTGACAAGTTTGGGATCCATGTATACCAGTTCCCTGAG 
TGTGACTCGGACGAGGATGAGGACTTCAAGCAGCAGGACCGGGAACTGAAGGAGAGCGCGCCC 
TTCGCCGTTATAGGCAGCAACACGGTGGTGGAGGCCAAGGGGCAGCGGGTCCGGGGCCGACTG 
TACCCCTGGGGGATCGTGGAGGTGGAGAACCAGGCGCATTGCGACTTCGTGAAGCTGCGCAAC 
ATGCTCATCCGCACGCATATGCACGACCTCAAGGACGTGACGTGCGACGTGCACTACGAGAAC 
TACCGCGCGCACTGCATCCAGCAGATGACCAGCAAACTGACCCAGGACAGCCGCATGGAGAGC 
CC C AT CC CG ATC CTGCCG CTGCC CAC CC CGG ACGCCG AG ACTG AG AAG CT T AT C AGG ATG AAG 
G ATGAGG AAC TGAGG CG C ATG C AGG AGATGCTGC AG AGG ATG AAG C AGC AG ATG C AG G ACC AG 
TGA 




ORF Start: ATG at 1 


1 




ORF Stop: TGA at 1135 




SEQ ID NO: 186 


378 aa 


MWat 43844.8kD 


NOV49a, 

CG128310-0I Protein 
Sequence 


MDSLAAPQDRLVEQLLSPRTQAQRRLKDIDKQYVGFATLPNQVHRKSVKKGFDFTLMVAGESG 
LGKSTLVHSLFLTDLYKDRKLLSAEERISQTVEILKHTVDIEEKGVKLKLTIVDTPGFGDAVN 
NTECWKPITDYVDQQFEQYFRDESGLNRKNIQDNRVHCCLYFISPFGHGLRPVDVGFMKALHE 
KVNIVPLIAKADCLVPSEIRKLKERIREEIDKFGIHVYQFPECDSDEDEDFKQQDRELKESAP 
FAVIGSNTWEAKGQRVRGRLYPWGIVEVENQAHCDFVKLRNMLIRTHMHDLKDVTCDVHYEN 
YRAHCIQQMTSKLTQDSRMESPIPILPLPTPDAETEKLIRMKDEELRRMQEMLQRMKQQMQDQ 




SEQ ID NO: 187 


.11113 bp 






NOV49b, 

CGI 283 10-02 DNA 
Sequence 


ATGAGCACAGGCCTGCGGTACAAGAGCAAGCTGGCGACCCCAGAGGACAAGCAGGACATTGAC 
AAGCAGT ACGTGGGCTTCGCC ACACTGCC CAACC AGGTG C AC CGC AAGTCGGTG AAG AAAGGC 
TTTGACTTCACACTCATGGTGGCTGGTGAGTCAGGCCTGGGGAAGTCCACACTGGTCCACAGC 
C T C TT CCTG AC AG ACTTGT AC AAGGACCGG AAG CTG C TC AGTGCT G AGG AGC G CATC AG CC AG 
ACGGTAGAGATTCTAAAACACACGGTGGACATTGAGGAGAAGGGAGTCAAGCTGAAGCTCACC 
ATCGTGGACACGCCGGGATTCGGGGACGCTGTCAACAACACCGAGTGCTGGAAGCCCATCACC 
GACTATGTGGACCAGCAGTTTGAGCAGTACTTCCGTGATGAGAGCGGCCTCAACCGAAAGAAC 
ATCCAAGACAACCGAGTGCACTGCTGCCTATACTTCATCTCCCCCTTCGGGCATGGGCTGCGG 
CCAGTGGATGTGGGTTTCATGAAGGCATTGCATGAGAAGGTCAACATCGTGCCTCTCATCGCC 
AAAGCTGACTGTCTTGTCCCCAGTGAGATCCGGAAGCTGAAGGAGCGGATCCGGGAGGAGATT 
GACAAGTTTGGGATCCATGTATACCAGTTCCCTGAGTGTGACTCGGACGAGGATGAGGACTTC 
AAGC AGC AGG AC CGGG AACTG AAGG AG AG CG CGCCCT T CGC CGTT AT AGGC AG C AAC A CGG TG 
GTGGAGGCCAAGGGGCAGCGGGTCCGGGGCCGACTGTACCCCTGGGGGATCGTGGAGGTGGAG 
AACCAGGCGCATTGCGACTTCGTGAAGCTGCGCAACATGCTCATCCGCACGCATATGCACGAC 
CTCAAGGACGTGACGTGCGACGTGCACTACGAGAACTACCGCGCGCACTGCATCCAGCAGATG 
ACCAGCAAACTGACCCAGGACAGCCGCATGGAGAGCCCCATCCCGATCCTGCCGCTGCCCACC 
CCGGACGCCGAGACTGAGAAGCTTATCAGGATGAAGGATGAGGAACTGAGGCGCATGCAGGAG 
ATGCTGCAGAGGATGAAGCAGCAGATGCAGGACCAGTGAAAG 




ORF Start: ATG at 1 


\ 
i 


jORF Stop: TGA at 1108 




SEQ ID NO: 188 


369 aa 


MW at 42776.6kD 


NOV49b, 

CGI 283 10-02 Protein 
Sequence 


MSTGLRYKSKLATPEDKQDIDKQYVGFATLPNQVHRKSVKKGFDFTLMVAGESGLGKSTLVHS 
LFLTDLYKDRKLLSAEERISQTVEILKHTVDIEEKGVKLKLTIVDTPGFGDAVNNTECWKPIT 
DYVDQQFEQYFRDESGLNRKNIQDNRVHCCLYFISPFGHGLRPVDVGFMKALHEKVNI VPLIA 
KADCLVPSEIRKLKERIREEIDKFGIHVYQFPECDSDEDEDFKQQDRELKESAPFAVIGSNTV 
VEAKGQRVRGRLYPWGIVEVENQAHCDFVKLRNMLIRTHMHDLKDVTCDVHYENYRAHCIQQM 
TSKLTQDSRMESPIPILPLPTPDAETEKLIRMKDEELRRMQEMLQRMKQQMQDQ 
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Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 49B. 



Table 49B. Comparison of NOV49a against NOV49b. 


Protein Sequence 


NOV49a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV49b 


27..378 
1 8.369 


336/352 (95%) 
337/352 (95%) 



5 Further analysis of the NOV49a protein yielded the following properties shown in 

Table 49C. 



Table 49C. Protein Sequence Properties NOV49a 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) (> 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV49a protein against the Geneseq database, a proprietary database 
10 that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 49D. 



Table 49D. Geneseq Results for NOV49a 


Geneseq 
Identifier 


Protcin/Organism/Length |Patent #, 
Date| 


NOV49a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


i 

i 

Expect 
Value 


ABB90771 


Human Tumour Endothelial Marker 
polypeptide SEQ ID NO 275 - Homo 
sapiens, 369 aa. (WO200210217-A2, 
07-FEB-2002] 


27..378 
18..369 


351/352 (99%) 
352/352 (99%) 


0.0 


AAB23259 


Human cell division regulator HCDR-1 
- Homo sapiens, 478 aa. [US61210I9- 
A, I9-SEP-2000] 


30..377 
121. .476 


277/357 (77%) 
312/357(86%) 


e-164 


AAW.73971 


Human HCDR-I protein sequence - 
Homo sapiens, 478 aa. [US5871973-A, 
16-FEB-I999] 


30..377 
121. .476 


277/357 (77%) 
312/357(86%) 


c-164 
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AAY23782 


Human cell division regulator (HCDR) 
1 - Homo sapiens, 478 aa. 
[US5928899-A,27-JUL-1999] 


30..377 
1 21. .476 


277/357 (77%) le-164 
312/357 (86%) j 

j 


AAG78669 


Human bradeion protein #2 - Homo 
sapiens, 478 aa. [JP2001 I6I384-A, 19- 
JUN-2001] 


30..377 
121. .476 


276/357 (77%) Ie-162 
310/357(86%) | 

i 


In a BLAST search of public sequence datbases, the NOV49a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 49E. 


Table 49E. Public BLASTP Results for NOV49a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV49a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9BGQ3 


Hypothetical 43.8 kDa protein - Macaca 
fascicularis (Crab eating macaque) 
(Cynomolgus monkey), 378 aa. 


1..378 
1..378 


377/378 (99%) 
378/378 (99%) 


0.0 


Q9JJM9 


CDCrel-1 A - Rattus norvegicus (Rat), 
378 aa. 


1..378 
1..378 


372/378 (98%) 
377/378 (99%) 


0.0 


Q8R2F7 


CDCrel-lAl - Rattus norvegicus (Rat), 
365 aa. 


1..362 
1..362 


357/362 (98%) 
360/362 (98%) 


0.0 


Q99648 


Septin (H5) - Homo sapiens (Human), 
417 aa (fragment). 


27.378 
66..417 


351/352 (99%) 
352/352 (99%) 


0.0 


Q99719 


Septin 5 (Peanut-like protein 1) (Cell 
division control related protein 1) 
(CDCREL-1) - Homo sapiens (Human), 
369 aa. 


27.378 
1 8.369 


351/352(99%) 
352/352 (99%) 


0.0 



5 

PFam analysis predicts that the NOV49a protein contains the domains shown in the 
Table 49F. 



Table 49F. Domain Analysis of NOV49a 


Pfam Domain 


NOV49a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


GTP_CDC 


50.330 


169/292 (58%) 
256/292 (88%) 


Me- 185 
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Example 50. 

The NOV50 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 50A. 



Table 50A. NOV50 Sequence Analysis 




SEQ ID NO: 189 


1122 bp 


NOV50a, 

CGI 28369-01 DNA 
Sequence 


GAGCGGGAAACTGGAGCTTAAATTCTGGCGGCGAGATGGACATTCTGAAATCAGAGATCCTTC 


GGAAGCGGCAGCTGGTGGAGGACAGGAACCTGCTGGTGGAAAATAAAAAATATTTCAAGCGTA 
GTG AG CT CGCAAAAAAAG AAG AGG AAGC AT ATTT TG AAAG ATGTGGCT AC AAG AT AC AG C C AA 
AAGAGGAAGACACACAGAATGATCTGAAAGTTCATGAGGAAAACACCACAATTGAAGAGTTAG 
AGGCGCTTGGAGAGTCCTTAGGGAAAGGCGATGATCATAAAGACATGGACATCATCACCAAAT 
TCCTGAAGTTTCTTCTTGGCGTTTGGGCTAAAGAATTGAATGCCAGAGAAGATTATGTGAAAC 
G CAGTGTG C AGGGT AAACTG AAC AGTG CG ACCC AG AAACAG AC CG AGTC C T AC C T AAG A C C AC 
TTTTT AG AAAGCTACGG AAAAGG AAT CTT CC TGC TGAT ATT AAAG AATC AAT AACGG A T AT T A 
TTAAATTCATGTTGCAGAGAGAATACGTGAAGGCAAATGATGCTTATCTTCAGATGGCCATTG 
G AAATG CGCCTTGG C CCATCG GT G TC AC T ATGGTTGGT ATCCATGCC AG AACTGG C AG AG AAA 
AGATTTTTTCCAAGCATGTTGCACATGTTTTAAATGACGAAACTCAGCGGAAATATATTCAGG 
G ATTG AAG AGGTT AATG AC CATT TGC CAG AAACA C TTTCCT AC AG AC CC AT C C AAATG TGTGG 
AGTACAATGCACTGTGAGATCTGTGTATGGTGTGTTAATAACAATAAGAAACTTAGGGAAGCA 


GGCTGTGGACTTCTGGAATTACCAACAGGAATGAGGAAAGAAGAAAACTGGAGTTTCCAGTCT 


CTGAGTTCTACCTGATGTAACTCTTGATTGGTTTTAAGAACTTTGTTGGCCTTCATTTCATAT 


CTGACTGCAAGCTGATTTTTCTTTCTTGCTTTCATTTTAATTAGTCCAAAATTAAGTTTTAAA 


GATTTTTCCTCACAATTTAAATCCATAGACAACAGAAGGGGGTTTAAAATGACCTTTTTTTCA 


GTTGACCCGAAAGTTGTGGTTAGATGATTAAAAAGAAACATTTGAAAAAAA 




ORF Start: ATG at 36 ORF Stop: TGA at 77 1 




SEQ ID NO: 190 245 aa M W at 28674.8kD 


NOV50a, 

CGI 28369-01 Protein 
Sequence 


MDILKSEILRKRQLVEDRNLLVENKKYFKRSELAKKEEEAYFERCGYKIQPKEEDTQNDLKVH 
EENTTIEELEALGESLGKGDDHKDMDIITKFLKFLLGVWAKELNAREDYVKRSVQGKLNSATQ 
KQTESYLRPLFRKLRKRNLPADIKESITDIIKFMLQREYVKANDAYLQMAIGNAPWPIGVTMV 
GIHARTGREKIFSKHVAHVLNDETQRKYIQGLKRLMTICQKHFPTDPSKCVEYNAL 




SEQ ID NO: 191 


809 bp 


NOVSOb, 

CGI 28369-02 DNA 
Sequence 


GAGCGGGAAACTGGAGCTTAAATTCTGGCGGCGAGATGGACATTCTGAAATCAGAGATCCTTC 


GGAAGCGGCAGCTGGTGGAGGACAGGAACCTGCTGGTGGAAAATAAAAAATATTTCAAGCGTA 
GTGAGCTCGCCAAAAAAGAAGAGGAAGCATATTTTGAAAGATGTGGCTACAAGATACAGCCAA 
AAGAGGAGGACCAGAAACCATTAACTTCATCGAATCCAGTGTTAGAACTTGAACTGGCAGAGG 
AAAAATTACCTATGACGCTTTCTAGGCAAGAGTTAGAGGCGCTTGGAGAGTCCTTAGGGAAAG 
GCGATGATCATAAAGACATGGACATCATCACCAAATTCCTGAAGTTTCTTCTTGGCGTTTGGG 
CTAAAGAATTGAATGCCAGAGAAGATTATGTGAAACGCAGTGTGCAGGGTAAACTGAACAGTG 
CGACCCAGAAACAGACCGAGTCCTACCTAAGACCACTTTTTAGAAAGCTACGGAAAAGGAATC 
TTCCTGCTGATATTAAAGAATCAATAACGGATATTATTAAATTCATGTTGCAGAGAGAATACG 
TGAAGGCAAATGATGCTTATCTTCAGATGGCCATTGGAAATGCGCCTTGGCCCATCGGTGTCA 
CTATGGTTGGTATCCATGCCAGAACTGGCAGAGAAAAGATTTTTTCCAAGCATGTTGCACATG 
TTTTAAATGACGAAACTCAGCGGAAATATATTCAGGGATTGAAGAGGTTAATGACCATTTGCC 
AGAAACACTTTCCTACAGACCCATCCAAATGTGTGGAGTACAATGCACTGTGA 




ORF Start: ATG at 36 J jORF Stop: TGA at 807 




SEQ ID NO: 1 92 257 aa |M W at 29956.4kD 



NOVSOb, 
CGI 28369-02 Protein 
Sequence 



MDILKSEILRKRQLVEDRNLLVENKKYFKRSELAKKEEEAYFERCGYKIQPKEEDQKPLTSSN 
PVLELELAEEKLPMTLSRQELEALGESLGKGDDHKDMDIITKFLKFLLGVWAKELNAREDYVK 
RSVQGKLNSATQKQTESYLRPLFRKLRKRNLPADIKESITDIIKFMLQREYVKANDAYLQMAI 
GNAPWPIGVTMVGIHARTGREKIFSKHVAHVLNDETQRKYIQGLKRLMTICQKHFPTDPSKCV 
EYNAL 



SEQ ID NO: 193 



843 bp 



NOV50c, 

CGI 28369-03 DNA 
Sequence 



GAGCGGGAAACTGGAGCTTAAATTCTGGCGGCGA ATGGACATTCTGAAATCAGAGATCCTTCG 



GAAGCGGCAGCTGGTGGAGGACAGGAACCTGCTGGTGGAAAATAAAAAATATTTCAAGCGTAG 
TG AG CTCG CC AAAAAAG AAG AGG AAG CAT ATTTTG AAAG ATG TGG CT AC AAG AT AC AG CC AAA 
AGAGGAAAACACCACAATTGAAGAGTTAGAGGCGCTTGGAGAGTCCTTAGGGAAAGGCGATGA 
TCATAAAGACATGGACATCATCACCAAATTCCTGAAGTTTCTTCTTG GCGTTTGGGCTAAAGA 
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ATTGAATGCCAGAGAAGATTATGTGAAACGCAGTGTGCAGGGTAAACTGAACAGTGCGACCCA 
GAAACAGACCGAGTCCTACCTAAGACCACTTTTTAGAAAGCTACGGAAAAGGAATCTTCCTGC 
TGATATTAAAGAATCAATAACGGATATTATTAAATTCATGTTGCAGAGAGAATACGTGAAGGC 
AAATGATGCTTATCTTCAGATGGCCATTGGAAATGCGCCTTGGCCCATCGGTGTCACTGTGGT 
TGGTATCCATGCCAGAACTGGCAGAGAAAAGATTTTTTCCAAGCATGTTGCACATGTTTTAAA 
TGGCGAAACTCAGCGGAAATATATTCAGGGATTGAAGAGGTTAATGACCATTTGCCAGAAACA 
CTTTCCTACAGACCCATCCAAATGTGTGGAGTACAATGCACTGTGAGATCTGTGTATGGTGTG 
TTAATAACAATAAGAAACTTAGGGAAGCAGGCTGTGGACTTCTGGAATTACCAACAGGAATGA 


GGAAAGAAGAAAACTGGAAGGGCG 


I 


ORF Start: ATG at 35 




ORF Stop: TGA at 737 




SEQ ID NO: 194 


234 aa 


MWat 27275.3kD 


NOV50c, 

CGI 28369-03 Protein 
Sequence 


MDILKSEILRKRQLVEDRNLLVENKKYFKRSELAKKEEEAYFERCGYKIQPKEENTTIEELEA 
LGESLGKGDDHKDMDIITKFLKFLLGVWAKELNAREDYVKRSVQGKLNSATQKQTESYLRPLF 
RKLRKRNLPADIKESITDIIKFMLQREYVKANDAYLQMAIGNAPWPIGVTWGIHARTGREKI 
FSKHVAHVLNGETQRKYIQGLKRLMTICQKHFPTDPSKCVEYNAL 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table SOB. 



Table 50B. Comparison of NOVSOa against NOVSOb and NOV50c. 


Protein Sequence 


NOV50a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV50b 


1..245 
1..257 


233/257 (90%) 
239/257 (92%) 


NOV50c 


1..245 

1..234 j 


232/245 (94%) 
233/245 (94%) 



5 

Further analysis of the NOV50a protein yielded the following properties shown in 
Table 50C. 



Table 50C. Protein Sequence Properties NOVSOa 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3600 probability located in mitochondrial 
matrix space; 0.1000 probability located in lysosome (lumen); 0.0000 probability 
located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



1 0 ' A search of the NOVSOa protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 50D. 
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Table 50D. Geneseq Results for NOV50a 


Geneseq 
Identifier 


Protein/Organ ism/Length | Patent #» 
Date] 


NOV50a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


ABB71440 


Drosophila melanogaster polypeptide 
bcQ ID NO 41 1 12 - Drosophila 
melanogaster, 340 aa. [WO200171042- 
A2, 27-SEP-2001] 


32..242 
1 24..333 


110/211 (52%) 
148/21 1 (70%) 


2e-59 


AAG36822 


Arabidopsis thaliana protein fragment 
oby ID NU: 45 1 7V - Arabidopsis 
thaliana, 30! aa. [EPI033405-A2, 06- 
SEP-2000] 


21. .245 

A *> *^£L A 

42. .264 


95/225 (42%) 
143/225 (63%) 


4e-45 


AAG36821 


Arabidopsis thaliana protein fragment 

CCfl IHTsJO- 4^17R - ArahiHnncic 

thaliana, 420 aa. [EP1033405-A2, 06- 
SEP-2000] 


21. .245 

1 O I ..JO J 


95/225 (42%) 

1 HJ/ZZJ \OJ /0) 


4e-45 


AAG36820 ; 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 45 1 77 - Arabidopsis 
thaliana, 438 aa. [EP1033405-A2, 06- 
SEP-2000] 


21.. 245 
I79..401 


95/225 (42%) 
143/225 (63%) 


4e-45 


AAG22300 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 25 1 74 - Arabidopsis 
thaliana, 301 aa. [EP1033405-A2, 06- 
SEP-2000] 


21.. 245 
42.. 264 


95/229(41%) 
145/229(62%) 


5e-45 



In a BLAST search of public sequence datbases, the NOV50a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 50E. 



Table 50E. Public BLASTP Results for NOVSOa 


Protein 

Accession 

Number 


Protcin/Organism/Length 


NOV50a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q99633 


HPRP18 (Pre-mRNA splicing factor 
similar to S. CEREVISI AE PRP1 8) - 
Homo sapiens (Human), 342 aa. 


50..245 
I47..342 


194/196 (98%) 
195/196 (98%) 


e-1 10 


Q9JKB8 


Potassium channel regulatory factor - 
Rattus norvegicus (Rat), 342 aa. 


50..245 
147..342 


193/196 (98%) 
195/196 (99%) 


e-109 


Q9V437 


PRPI8 protein - Drosophila 
melanogaster (Fruit fly), 340 aa. 


32..242 
I24..333 


110/211 (52%) 
148/211 (70%) 


6c-59 
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AAM44364 


POTASSIUM CHANNEL 
REGULATORY FACTOR - 
Dictyostelium discoideum (Slime 
mold). 389 aa. 


22..240 
181. .388 


109/224 (48%) 
136/224 (60%) 


2e-48 


Q9SA55 


FI0O3.3 protein (Hypothetical 47.9 
kDa protein) - Arabidopsis thaliana 
(Mouse-ear cress), 420 aa. 


21. .245 
161. .383 


95/225 (42%) 
143/225 (63%) 


le-44 



PFam analysis predicts that the NOV50a protein contains the domains shown in the 
Table 50F. 



Table 50F. Domain Analysis of NOV50a 


Pfam Domain 


NOV50a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


hormone2 


200..223 


5/24(21%) 
18/24 (75%) 


0.59 


Prp18 


91. .235 


102/148(69%) 
145/148(98%) 


1.2e-106 



5 



Example 51. 

The NOV5 1 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 51 A. 



Tabic 51 A. NOV51 Sequence Analysis 




SEQ IDNO:195 2468 bp 3 


NOVSla, 

CG 128420-01 DNA 
Sequence 


GTAAAGCTCTGCCATAAACTTCTAGCGTGTGCCAATGGAATTCCAGCTTCGTGTTTTGCTTTC 


CGCTCCTCGGAACATCCGGGAGAGTTGACTTCCGGCGGCTTGTGGGAGTGCTGGTTCTGTCCT 


CCTTGCGGGTGCGGAGATGGTTGTCTTGGTTACGGGTCCTAACGGTCCCCTGCCTTGAAATCC 


CTTGTTGAGGGCCTGCAACCTTGTGCTTCCGACTGGAGACGCCTTTGGTCCCTCGGTGTCTGC 


ACTGGCTGCTGGTCAAGGCTTCAGTGTGGAGTAATTGACACTTTCGAGATTGAAGAATTGGAG 


GAGAAACTTAATGATGCACTTCACCAGAAGCAGCTACTAACATTGAGATTAGACAACCAATTG 


GCTTTTCAACAGAAAGATGCCAGCAAATATCAAGAATTAATGAAACAAGAAATGGAAACCATT 


TTGTTGAGACAGAAACAACTAGAAGAGACAAATCTTCAGCTAAGAGAAAAAGCTGGAGATGTT 
CGTCGAAACCTGCGTGACTTTGAGTTGACAGAAGAGCAATATATTAAATTANAAGCTTTTCCT 
GT^AGATCAGCTTTCTATTCCTGAATATGTATCTGTTCGCTTCTATGAGCTAGTGAATCCATTA 
AG AAAGG AAATCTGTG AAC T AC AAG TG AAAAAG AAT ATCCT AG C AG AAG AATT AAG T A C AAA C 
AAAAACCAACTGAAGCAGCTGACAGAGACATATGAGGAAGATCGAAAAAACTACTCTGAAGTT 
CAAATTAGATGTCAACGTTTGGCCTTAGAATTAGCAGACACAAAACAGTTAATTCAGCAAGGT 
GACTACCGTCAAGAGAACTATGATAAAGTCAAGAGTGAACGTGATGCACTTGAACAGGAAGTA 
ATTGAGCTTAGGAGAAAACATGAAATACTTGAAGCCTCTCACATGATTCAAACAAAAGAACGA 
AGTGAATTATCAAAAGAGGTAGTCACCTTAGAGCAAACTGTTACTTTACTGCAAAAGGATAAA 
GAATATCTTAATCGCCAAAACATGGAGCTTAGTGTCTGCTGTGCTCATGAAGAGGATCGCCTT 
GAAAGACTTCAAGCTCAACTGGAAGAAAGCAAAAAGGCTAGAGAAGAGATGTATGAAAAATAT 
GTAGCATCCAGAGACCATTATAAAACAGAATATGAAAATAAACTACATGATGAACTAGAACAA 
ATCAGATTGAAAACCAACCAAGAAATTGATCAACTTCGAAATGCCTCTAGGGAAATGTATGAA 
CG AG AAAAC AG AAATCTCCG AG AAG C AAGGG AT AATG CTGTGGCT G AAAAGG AAC G AG C AG TG 
ATGGCTGAAAAGGATGCTTTAGAAAAACACGATCAGCTCTTAGACAGGTACAGAGAACTACAA 
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CTTAGTACAGAAAGCAAAGTAACAGAATTTCTCCATCAAAGTAAATTAAAATCTTTTGAAAGT 
GAGCGTGTTCAACTTCTGCAAGAGGAAACAGCAAGAAATCTCACACAGTGTCAATTGGAATGT 
GAAAAATATCAGAAAAAATTGGAGGTTTTAACCAAAGAATTTTATAGTCTCCAAGCCTCTTCT 
GAAAAACGCATTACTGAACTTCAAGCACAGAACTCAGAGCATCAAGCAAGGCTAGACATTTAT 
GAGAAACTGGAAAAAGAGCTTGATGAAATAATAATGCAAACTGCAGAAATTGAAAATGAAGAT 
GAGGCTGAAAGGGTTCTTTTTTCCTACGGCTATGGTGCTAATGTTCCCACAACAGCCAAAAGA 
CGACTAAAGCAAAGTGTTCACTTGGCAAGAAGAGTGCTTCAATTAGAAAAACAAAACTCGCTG 
ATTNTTAAAAGATCTGGAACATCGAAAGGACCAAGTAACACAGCTTTCACCAGGAGCTTGACA 
GAGGCCAATTCGCTATTAAACCAGACTCAACAGCCTTACAGGTATCTCATTGAATCAGTGCGT 
CAGAGAGATTCTAAGATTGATTCACTGACGGAATCTATTGCACAACTTGAGAAAGATGTCAGC 
AACTTAAATAAAGAAAAGTCAGCTTTACTACAGACGAAGAATCAAATGGCATTAGATTTAGAA 
CAACTTCTAAATCATCGTGAGGAATTGGCAGCAATGAAACAGATTCTCGTTAAGATGCATAGT 
AAACATTCTGAGAACAGCTTACTTCTCACTAAAACAGAACCAAAACATGTGACAGAAAATCAG 
AAATC AAAG A CTT TG AATGTG CCTAAAG AG C ATG AAG ACAATATATTT AC ACC T AAAC C AA C A 
CTCTTTACTAAAAAAGAAGCACCTGAGTGGTCTAAGAAACAAAAGATGAAGACCTAGTGTTTT 
GGATGGGAAGCACCTGTAGACCATTATATACTCCTGAAGTTCTTTTTCTGATGGAAAACAAAA 


TTCAGTTTAATCGTGTACTCAGCATTTTTTAAATAACAATGTTTATTTGAACTAATATTAAAT 


TAACAAATTCG 




ORF Start: ATG at 41 8 


]ORF Stop: TAG at 2323 




SEQ ID NO: 196 


635 aa MW at 74956.9kD 


NOV5Ia, 

CGI 28420-01 Protein 
Sequence 


MKQEMETILLRQKQLEETNLQLREKAGDVRRNLRDFELTEEQYIKLXAFPEDQLS I PEYVSVR 
FYELVNPLRKEICELQVKKNILAEELSTNKNQLKQLTETYEEDRKNYSEVQIRCQRLALELAD 
TKQLIQQGDYRQENYDKVKSERDALEQEVIELRRKHEILEASHMIQTKERSELSKEWTLEQT 
VTLLQKDKEYLNRQNMELSVCCAHEEDRLERLQAQLEESKKAREEMYEKYVASRDHYKTEYEN 
KLHDELEQIRLKTNQEIDQLRNASREMYERENRNLREARDNAVAEKERAVMAEKDALEKHDQL 
LDRYRELQLSTESKVTEFLHQSKLKSFESERVQLLQEETARNLTQCQLECEKYQKKLE^LTKE 
FYSLQASSEKRITELQAQNSEHQARLDIYEKLEKELDEIIMQTAEIENEDEAERVLFSYGYGA 
NVPTTAKRRLKQSVHLARRVLQLEKQNSLIXKRSGTSKGPSNTAFTRSLTEANSLLNQTQQPY 
RYLIESVRQRDSKIDSLTESIAQLEKDVSNLNKEKSALLQTKNQMALDLEQLLNKREELAAMK 
QILVKMHSKHSENSLLLTKTEPKHVTENQKSKTLNVPKEHEDNIFTPKPTLFTKKEAPEWSKK 
QKMKT 



Further analysis of the NOV5 1 a protein yielded the following properties shown in 
Table 5 IB. 



Tabic 51B. Protein Sequence Properties NOV51a 


PSort 
analysis: 


0.3000 probability located in microbody (peroxisome); 0.3000 probability located in 
nucleus; 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



5 



A search of the NOV5 1 a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table SIC. 
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! Table 51C Genescq Results for NOVSla 



i 

| Gencseq 
: Identifier 


Protein/Organism/Length |Patent#, 
Date] 


NOV51a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM80122 


Human protein SEQ ID NO 3768 - 
Homo sapiens, 690 aa. 
[WO200157190-A2, 09-AUG-200I] 


1..559 
I29..689 


551/561 (98%) 
554/561 (98%) 


0.0 


AAM79138 


Human protein SEQ ID NO 1800 - 
Homo sapiens, 6 1 2 aa. 
[WO200157190-A2, 09-AUG-200I] 


1..473 
I24..596 


469/473 (99%) 
470/473 (99%) 


0.0 


AAW54095 


Homo sapiens TCL52 sequence - 
Homo sapiens, 3 1 2 aa. [W098 12327- 
A2, 26-MAR- 199.8] 


327..635 
5..312 


292/309 (94%) 
294/309 (94%) 


e-159 


AAG73667 

i 


Human colon cancer antigen protein 
SEQ ID NO:443 1 - Homo sapiens, 244 

aa. [WO2001 22920- A2, 05-APR-2001] 

■ 


3I5..551 
1..238 


209/239 (87%) 
213/239 (88%) 


e-105 


AAG03318 

i 


Human secreted protein, SEQ ID NO: 
7399 - Homo sapiens, 53 aa. 
[EPI 03340 1-A2, 06-SEP-2000] 


566..6I7 
I..52 


52/52(100%) 
52/52(100%) 


2e-23 


In a BLAST search of public sequence datbases, the NOV5 la protein was found to 
have homology to the proteins shown in the BLASTP data in Table 5 1 D. 


r s — — ^ ^ ■■■■ , 

j Table 51D. Public BLASTP Results for NOVSla 


i 

! 

! Protein 
! Accession 
! Number 


Protein/Organism/Length 


NOV51a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


095664 

! 


PIBF1 protein - Homo sapiens 
(Human), 758 aa. 


I. .635 
124..758 


633/635 (99%) 
634/635 (99%) 


0.0 


Q8WXW3 


Progesterone-induced blocking 
factor 1 - Homo sapiens (Human), 
757 aa. 


I..635 
124..757 


616/635 (97%) 
617/635 (97%) 


0.0 


Q96SF4 

1 

r 


BA555G22.2 (PIBF1 protein) - 
Homo sapiens (Human), 603 aa 
(fragment). 


1..473 
I24..588 


461/473(97%) 
462/473 (97%) 


0.0 


Q9CVX7 


1700017E21Rik protein - Mus 
musculus (Mouse), 323 aa 
(fragment). 


317..635 i 
7.323 


272/319(85%) 
288/319(90%) 


e-147 


Q9D551 


49305 1 3H1 5Rik protein - Mus 
musculus (Mouse), 250 aa. 


I..I0I 
I24..224 


90/101 (89%) 
97/101 (95%) 


2e-44 



5 
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PFam analysis predicts that the NOVSIa protein contains the domains shown in the 
Table 5 IE. 



! Table 51 E. Domain Analysis of NOVSIa 


! 

: Pfam Domain 


NOVSIa Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 



i 



5 Example 52. 

The NOV52 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 52A. 



jTable 52A. NOV52 Sequence Analysis 


! 


SEQIDNO: 197 


1108 bp 




|NOV52a, 

{CG 1285 19-01 DNA 
Sequence 

I 


GACCCGTGTTGGGATGGGAGCCCGAACCCCCAGGCCTGGGGCGGGCCTCAGGGACCAGCAAAT 
GGCCCCATCCGCTGCTCCTCAGGCCCCAGAAGCCTTCACACTCAAGGAGAAGGGGCACCTGCT 
G CGGC TG CCTG CG GCAT TCAGG AAAGC AG CTTCCC AG AACTCG AG CCTGTGGGCCC AG CTC AG 
TTCCACACAGACCAGTGATTCCACGGATGCCGCCGCTGCCAAAACCCAGTTCCTCCAGAACAT 
GCAGACAGCTTCAGGCGGGCCCCAGCCCAGGCTCAGTGCTGTGGAGGTGGAGGCGGAGGCGGG 
GCGCCTGCGGAAGGCCTGCTCGCTGCTGAGACTGCGCATGAGGGAGGAGCTCTCAGCAGCCCC 
CATGGACTGGATGCAGGAGTACCGCTGCCTGCTCACGCTGGAGGGGCTGCAGGCCATGGTGGG 
CCAGTGTCTGCACAGGCTGCAGGAGCTGCGTGCAGCGGTGGCGGAACAGCCACCAAGACCATG 
TCCTGTGGGGAGGCCCCCCGGAGCCTCGCCGTCCTGTGGGGGTAGAGCGGAGCCTGCATGGAG 
CCCCCAGCTGCTTGTCTACTCCAGCACCCAGGAGCTGCAGACCCTGGCGGCCCTCAAGCTGCG 
AGTGGCTGTGCTGGACCAGCAGATCCACTTGGAAAAGGTCCTGATGGCTGAACTCCTCCCCCT 
GGTAAGCGCTGCACAGCCGCAGGGGCCGCCCTGGCTGGCCCTGTGCCGGGCTGTGCACAGCCT 
GCTCTGCGAGGGAGGAGCACGTGTCCTTACCATCCTGCGGGATGAACCTGCAGTCTGAGCCTT 
TCCCATGCTGCCCTCGGCCTGTTCAGATGGGGATTGGGGGTGTCTTCCCTGGCACTGTGCTCG 


GGGACCCAGAGATGCCTGTGCTTCCCTGGGAAACCTGGTGAACTGGACCAGGTGGCCTCACTG 


GCTCTTCTCAGGACAACTAAGCCTGCTGGTCAGGGCTGGCTTTCAGCCTTCCTAAGGCTCCTG 


GACTCCAGAGGCCAGCGGGGAGCCTTTCCTGGCTCCCTCTGTTTTCTCTCACTGTAGACCAAA 


GAG CCG CTTGTGTG AT ATT AAAG CC ACT T T AG AAAG C 




ORF Start: ATG at 14 




ORF Stop:TGA at 812 




SEQIDNO: 198 266aa MW at 28644.7kD 


|NOV52a, 

jCG 1285 19-01 Protein 
Sequence 


MGARTPRPGAGLRDQQMAPSAAPQAPEAFTLKEKGHLLRLPAAFRKAASQNSSLWAQLSSTQT 
SDSTDAAAAKTQFLQNMQTASGGPQPRLSAVEVEAEAGRLRKACSLLRLRMREELSAAPMDWM 
QEYRCLLTLEGLQAMVGQCLHRLQELRAAVAEQPPRPCPVGRPPGASPSCGGRAEPAWSPQLL 
VYSSTQELQTLAALKLRVAVLDQQIHLEKVLMAELLPLVSAAQPQGPPWLALCRAVHSLLCEG 
GARVLTI LRDEPAV 


f 


SEQIDNO: 199 |S3 1 bp 


NOV52b, 

CGI 285 19-02 DNA 
Sequence 


GACCCGTGTTGGGATGGGAGCCCGAACCCCCAGGCCTGGGGCGGGCCTCAGGAACCAGCAAAT 
GGCCCCATCCGCTGCTCCTCAGGCCCCAGAAGCCTTCACACTCAAGGAGAAGGGGCACCTGCT 
GCGGCTGCCTGCGGCATTCAGGAAAGCAGCTTCCCAGAACTCGAGCCTGTGGGCCCAGCTCAG 
TTCCACACAGACCAGTGATTCCACGGATGCCGCCGCTGCCAAAACCCAGTTCCTCCAGAACAT 
GCAGACAGCTTCAGGCGGGCCCCAGCCCAGGCTCAGTGCTGTGGAGGTGGAGGCGGAGGCGGG 
GCG CC TG CGG AAGG C CTG CTCGCTGC TG AG ACTG CG C ATG AGGG AGG AGCTCTCG G C AG C C CC 
CATGGACTGGATGCAGGAGTACCGCTGCCTGCTCACGCTGGAGGGGCTGCAGGCCATGGTGGG 
CCAGTGTCTGCACAGGCTGCAGGAGCTGCGTGCAGCGGTGGCGGAACAGCCACCAAGACCATG 
TCCTGTGGGGAGGCCCCCCGGAGCCTCGCCGTCCTGTGGGGGTAGAGCGGAGCCTGCATGGAG 
CCCCCAGCTGCTTGTCTACTCCAGCACCCAGGAGCTGCAGACCCTGGCGGCCCTCAAGCTGCG 
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AGTGGCTGTGCTGGACCAGCAGATCCACTTGGAAAAGGTCCTGATGGCTGAACTCCTCCCCCT 
GGTAAGCGCTGCACAGCCGCAGGGGCCGCCCTGGCTGGCCCTGTGCCGGGCTGTGCACAGCCT 
GCTCTGCGAGGGAGGAGCACGTGTCCTTACCATCCTGCGGGATGAACCTGCAGTCTGAGCCTT 
TCCCATGCTGCC 




ORF Start: ATG at 14 


JORFStop: TGA at 812 




SEQIDNO:200 ]266 aa jMW at 28643.8kD 


NOV52b, 

CGI 285 19-02 Protein 
Sequence 


MGARTPRPGAGLRNQQMA PS AAPQAPEAFTLKEKGHLLRLPAAFRKAASQNSS LWAQLSSTQT 
SDSTDAAAAKTQFLQNMQTASGGPQPRLSAVEVEAEAGRLRKACSLLRLRMREELSAAPMDWM 
QEYRCLLTLEGLQAMVGQCLHRLQELRAAVAEQPPRPCPVGRPPGASPSCGGRAEPAWSPQLL 
VYSSTQELQTLAALKLRVAVLDQQIHLEKVLMAELLPLVSAAQPQGPPWLALCRAVHSLLCEG 
GARVLTI LRDEPAV 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 52B. 



Table 52B. Comparison of NOV52a against NOV52b. 


i 

I 

! Protein Sequence 


NOV52a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV52b 


1..266 
1 ..266 


212/266 (79%) 
213/266 (79%) 



5 

Further analysis of the NOV52a protein yielded the following properties shown in 
Table 52C. 



r M • 1 — ■ — ■ ■ — - 

Table 52C. Protein Sequence Properties NOV52a 


PSort 
analysis: 


0.6500 probability located in cytoplasm; 0.2413 probability located in lysosome 
(lumen); 0.1000 probability located in mitochondrial matrix space; 0.0000 
probability located in endoplasmic reticulum (membrane) 


SignaiP 
1 analysis: 


No Known Signal Sequence Predicted 



1 0 A search of the NOV52a protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 52D. 
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: Table 52D. Geneseq Results for NOV52a 


j 

j Geneseq 
; Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV52a 
Residues/ 
Match 
Residues 


j Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB94803 

i 
j 


Human protein sequence SEQ ID 
NO: 15937 - Homo sapiens, 266 aa. 
[EP1074617-A2, 07-FEB-2001] 


I. .266 
1 ..266 


266/266(100%) 
266/266(100%) 


e-151 


j AAM502I3 


Human interleukin-1 ] -like AMF7 C- 
terminal polypeptide - Homo sapiens, 
435 aa. [WO200174897-A2, 11-OCT- 
2001] 


I..266 
170..435 


266/266(100%) 
266/266(100%) 


e-151 


AAM94742 

i 

i 


Human reproductive system related 
antigen SEQ ID NO: 3400 - Homo 
sapiens, 156 aa. [WO200155320-A2, 
02-AUG-2001] 


33. .186 
I ..155 


134/155 (86%) 
137/155 (87%) 


4e-70 


1 AAU69744 


Thermus thermophilus MutS DNA 
mutation binding protein - Thermus 
thermophilus, 819 aa. [WO200 173079- 
A2, 04-OCT-2001] 


29..I61 
249..399 


47/151 (31%) 
64/151 (42%) 


0.006 


AAY4493I 

! 

i 


Mammalian adipose differentiation 
associated protein - Mammalia, 286 aa. 
[WO200006591-A2, 10-FEB-2000] 


14..I57 
84..232 


41/157(26%) 
68/157 (43%) 


0.33 



In a BLAST search of public sequence datbases, the NOV52a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 52E. 



Table 52E. Public BLASTP Results for NOV52a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV52a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


CAD10336 

t 

i 


Sequence 13 from Patent WOO 174897 
- Homo sapiens (Human), 435 aa 
(fragment). 


1..266 
I70..435 


266/266(100%) 
266/266(100%) 


e-150 


j Q9H872 
1 

! 


CDNA FU1 3909 fis, clone 
Y79AA 1 000065 (Hypothetical 28.6 
kDa protein) - Homo sapiens 
(Human), 266 aa. 


I..266 
I..266 


266/266(100%) 
266/266(100%) 


e-150 


Q9DAZ6 


1600002H07Rik protein - Mus 
musculus (Mouse), 251 aa. 


I7..265 
1..250 


167/250 (66%) 
192/250(76%) 


3c-87 


Q96H6I 


Similar to hypothetical protein 
FLJ 13909 - Homo sapiens (Human), 
I95aa. 


1..176 
1..17I 


161/176 (91%) 
161/176(91%) 


2e-84 
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S62790 


mismatch DNA recognition protein 


29..161 


47/151 (31%) 


0.017 




mutS [validated] - Thermus aquaticus. 


248.398 


64/151 (42%) 






818 aa (fragment). 









PFarn analysis predicts that the NOV52a protein contains the domains shown in the 
Table 52F. 



Table 52F. Domain Analysis of NOV52a 



Pfam Domain 



NOV52a Match Region 



Identities/ 
Similarities 

for the Matched Region 



Expect Value 



Example 53. 

The NOV53 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 53A. 



Table 53 A. NOV53 Sequence Analysis 



SEQIDNO: 201 



11341 bp 



NOV53a, 

CGI 28626-01 DNA 
Sequence 



ATGTCTCAGGTGACATTTAATGATGTGGCTATAGACTTCACTCATGAAGAGTGGGGATGGCTC 
AGTTCTGCTCAGAGGGACTTATACAAGGATGTGATGGTCCAGAATTATGAGAACCTGGTCTCT 
GTAGGTCTTTCTGTAACTAAGCCATATGTGATCACGTTATTGGAGGATGGAAAAGAGCCCTGG 
ATGATGGAGAAAAAACTGTCAAAAGGTATGATTCCAGATTGGGAATCAAGATGGGAAAACAAG 
GAATTATCAACAAAGAAGGATAATTATGATGAAGATTCACCCCAAACAGTAATAATAGAAAAA 
GTTGTAAAACAAAGTTATGAATTTTCAAATTCTAAGAAGAATTTGGAATATATAGAGAAGTTG 
GAAGGGAAGCATGGAAGTCAGGTAGACCATTTCAGACCAGCAATTCTCACCTCTAGAGAAAGC 
CCCACTGCAGACAGTGTTTACAAATACAATATATTTAGAAGCACCTTTCATTCAAAGTCTACT 
CTTTCTGAACCACAAAAAATTTCTGCTGAAGGGAATTCACACAAATATGATATATTAAAGAAG 
AACTT AC CAAAAAAGTC AGTT AT AAAAAATG AG AAAGTC AATGGTGG AAAG AAAC T TT TG AAT 
TCTAATAAAAGTGGGGCAGCCTTCAGCCAGGGCAAATCTCTTACCCTTCCCCAGACTTGTAAT 
AGAGAGAAAATCTATACATGCAGTGAATGTGGGAAAGCCTTTGGCAAACAGTCAATCCTCAAT 
CG CCACTGG AG AATT C AT AC AGG AG AG AAG CC CT ATG AATG TCGTG AATGTGG G AAG A CTTTT 
AG CC ATGGCTC AT CCCTT AC ACG AC AT C TG AT AAG CC AT AGTG G AG AG AAAC C TT AC AAAT GT 
ATTGAATGTGGGAAGGCCTTTAGCCATGTCTCATCACTTACTAACCATCAGAGCACTCACACT 
GGAGAGAAACCATATGAATGTATGAACTGTGGAAAGTCTTTTAGTCGTGTGTCCCATCTTATT 
G AACATC T AAGAATT CAT ACT C AAG AAAAACTCT ATG AG TGT CGT AT ATG TGG AAAGG CCT TC 
ATTCAT AGGTC AT CT CT C ATT C AC C ATC AG AAAATC C AT ACTGG AG AG AAGC CTT ATG AATGT 
AGAGAATGTGGGAAAGCTTTCTGCTGTAGCTCACACCTTACTCGACATCAAAGAATTCACACT 
ATGGAGAAACAATATGAATGCAACAAATGTCTGAAAGTCTTTAGTAGCCTCTCATTTCTTGTT 
CAGCATCAGAGTATTCATACTGAAGAAAAACCCTTGAAGTTTAGAAATGCAGGAAATCCTTCA 
ACCAGCTTGAATCACTGA 



ORF Start: ATG at 1 



ORF Stop:TGA at 1339 



SEQIDNO: 202 



446 aa 



MWat 51273.8kD 



!NOV53a, 

'CG 128626-01 Protein 
Sequence 



MSQVTFNDVAIDFTHEEWGWLSSAQRDLYKDVMVQNYENLVSVGLSVTKPYVITLLEDGKEPW 
MMEKKLSKGMIPDWESRWENKELSTKKDNYDEDSPQTVIIEKWKQSYEFSNSKKNLEYIEKL 
EGKHGSQVDHFRPAILTSRESPTADSVYKYNIFRSTFHSKSTLSEPQKISAEGNSHKYDILKK 
NLPKKSVIKNEKVNGGKKLLNSNKSGAAFSQGKSLTLPQTCNREKIYTCSECGKAFGKQSILN 
RHWRIHTGEKPYECRECGKTFSHGSSLTRHLISHSGEKPYKCIECGKAFSHVSSLTNHQSTHT 
GEKPYECMNCGKSFSRVSHLIEHLRIHTQEKLYECRICGKAFIHRSSLIHHQKIHTGEKPYEC 
RECGKAFCCSSHLTRHQRIHTMEKQYECNKCLKVFSSLSFLVQHQSIHTEEKPLKFRNAGNPS 
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TSLNH | ^ | 

Further analysis of the NOV53a protein yielded the following properties shown in 
Table 53B. 



Table 53B. Protein Sequence Properties NOV53a 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



5 

A search of the NOV53a protein against the Geneseq database, a proprietary database 



that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 53C. 



Table 53C. Geneseq Results for NOV53a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date) 


NOV53a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


ABB50I80 


Human transcription factor TRFX-3 1 - 
Homo sapiens, 570 aa. 
[WO200172777-A2, 04-OCT-200I] 


I..445 
1 ..448 


436/448 (97%) 
437/448 (97%) 


0.0 

1 


AAU16110 


Human novel secreted protein, Seq ID 
1063 - Homo sapiens, 585 aa. 
[WO200I55322-A2, 02-AUG-2001] 


1..445 
16..463 


416/448 (92%) 
417/448 (92%) 


o.o i 


AAU87342 


Novel central nervous system protein 

#252 - Homo sapiens, 398 aa. 

[ WO200 1 553 1 8-A2, 02-AUG-200I ] 


I..380 
16..395 


377/380 (99%) 
378/380 (99%) 


o.o ! 

i 


ABG27051 


Novel human diagnostic protein 
#27042 - Homo sapiens, 783 aa. 
[WO200175067-A2, 1 l-OCT-2001] 


60..445 
12..400 


376/389 (96%) 
377/389 (96%) 


0.0 


ABB50239 


Human transcription factor TRFX-90 - 
Homo sapiens, 399 aa. 
[WO200I72777-A2, 04-OCT-2001] 


1..433 
1..398 


360/433 (83%) 
375/433 (86%) 


0.0 
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In a BLAST search of public sequence datbases, the NOV53a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 53D. 



304 



WO 03/023002 ^PPCT/US02/28539 



i Table 53 D. Public BLASTP Results for NOV53a 



1 

1 
1 

r rOlCin 

Accession 
Number 


I 

Protein/Organism/Length 


NOV53a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


I 
i 

I Expect 
j Value 


Q9P0J4 


Zinc finger protein ZNF140-like protein 
- Homo sapiens (Human), 411 aa. 


1..446 
L.411 


371/446 (83%) 
386/446 (86%) 


0.0 


Q9BZDS 


ZNF140-like transcription factor (Zinc 
finger protein 302) - Homo sapiens 
(Human), 399 aa. 


I..433 
1..398 


360/433 (83%) 
375/433 (86%) 


0.0 


Q9NRII 


ZNF135-like protein - Homo sapiens 
(Human), 478 aa. 


1..433 {336/485 (69%) 
1..477 j 365/485 (74%) 


0.0 


Q96ND8 


CDNA FLJ3I030fis, clone 
HSYRA 1000 1 37, moderately similar to 
zinc finger protein 1 84 - Homo sapiens 
(Human), 569 aa 


4..438 1210/437 (48%) 
6..442 | 287/437 (65%) 

i 


e-120 


Q96NI8 

.... . 


CDNA FLJ30791 fis, clone 
FEBRA2000972, moderately similar to 
zinc finger protein 184 - Homo sapiens 
(Human), 536 aa. 


4..438 j 2 13/437 (48%) 
14..448 -! 280/437 (63%) 

1 
i 

! 


e-l 16 



PFam analysis predicts that the NOV53a protein contains the domains shown in the 
Table 53E. 



[Table 53E. Domain Analysis of NOV53a 






i 

Pfam Domain 


NOV53a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


KRAB 


4..66 


42/66 (64%) 
54/66 (82%) 


2.6e-36 


zf-C2H2 


236..2S8 


12/24 (50%) 
20/24 (83%) 


9.4e-05 


zf-C2H2 


264..286 


13/24 (54%) 
20/24 (83%) 


3.5e-06 


zf-BED 


249.-287 


14/52 (27%) 
29/52 (56%) 


0.084 


zf-C2H2 


292..3I4 


13/24(54%) 
20/24 (83%) 


0.00014 


zf-C2H2 


320..342 


15/24 (62%) 
18/24 (75%) 


5.3e-06 


zf-BED 


305..343 


15/52 (29%) 
25/52 (48%) 


0.38 
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zf-C2H2 


348..370 j 10/24 (42%) 

j 20/24 (83%) 


5.2e-06 


zf-C2H2 


376.J98 | 12/24 (50%) 

121/24 (88%) 

,, 1 , , rim. i in i 1 


0.00092 


zf-C2H2 


404..426 10/24(42%) 

18/24 (75%) 


0.00039 

i 

t 



Example 54. 

The NOV54 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 54A. 



Table 54A. NOV54 Sequence Analysis 



SEQ ID NO: 203 



j3078 bp 



{NOV54a, 

|CG128852-01 DNA 
Sequence 



ATGTCGGCAGCCAAGGAGAACCCGTGCAGGAAATTCCAGGCCAACATCTTCAACAAGAGCAAG 

TGTCAGAACTGCTTCAAGCCCCGCGAGTCGCATCTGCTCAACGACGAGGACCTGACGCAGGCA 

AAACCCATTTATGGCGGTTGGCTGCTCCTGGCTCCAGATGGGACCGACTTTGACAACCCAGTG 

CACCGGTCTCGGAAATGGCAGCGACGGTTCTTCATCCTTTACGAGCACGGCCTCTTGCGCTAC 

GCCCTGGATGAGATGCCCACGACCCTTCCTCAGGGCACCATCAACATGAACCAGTGCACAGAT 

GTGGTGGATGGGGAGGGCCGCACGGGCCAGAAGTTCTCCCTGTGTATTCTGACGCCTGAGAAG 

G AGCATTTC AT C CGGGCGG AG ACC AAGG AGATCG TC AGT GG GTGG CTGG AG ATG CTC ATGG TC 

TAT CCCCGGAC C AAC AAG C AG AATC AG AAG AAG AAACGG AAAGTG G AGCCC C C C A C AC C AC AG 

GAGCCTGGGCCTGCCAAGGTGGCTGTTACCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGC 

ATCCCCAGTGCTGAGAAAGTCCCCACCACCAAGTCCACACTCTGGCAGGAAGAAATGAGGACC 

AAGGACCAGCCAGATGGCAGCAGCCTGAGTCCAGCTCAGAGTCCCAGCCAGAGCCAGCCTCCT 

GCTGCCAGCTCCCTGCGGGAACCTGGGCTAGAGAGCAAAGAAGGTGAGAGCGCCATGAGTAGC 

GACCGCATGGACTGTGGCCGCAAAGTCCGGGTGGAGAGCGGCTACTTCTCTCTGGAGAAGACC 

AAACAGGACTTGAAGGCTGAAGAACAGCAGCTGCCCCCGCCGCTCTCCCCTCCCAGCCCCAGC 

ACCCCCAACCACAGGAGGTCCCAGGTGATTGAAAAGTTTGAGGCCTTGGACATTGAGAAGGCA 

GAGCACATGGAGACCAATGCAGTGGGGCCCTCACCATCCAGCGACACACGCCAGGGCCGCAGC 

GAGAAGAGGGCGTTCCCTAGGAAGCGGGACTTCACCAATGAAGCCCCCCCAGCTCCTCTCCCA| 

GACGCCTCGGCTTCCCCCCTGTCTCCACACCGAAGAGCCAAGTCACTGGACAGGAGGTCCACG | 

GAGCCCTCCGTGACGCCCGACCTGCTGAATTTCAAGAAAGGCTGGCTGACTAAGCAGTATGAG j 

GACGGCCAGTGGAAGAAACACTGGTTTGTCCTCGCCGATCAAAGCCTGAGATACTACAGGGAT | 

TCAGTGGCTGAGGAGGCAGCCGACTTGGATGGAGAAATTGACTTGTCCGCATGTTACGATGTC ' 

ACAGAGTATCCAGTTCAGAGAAACTATGGCTTCCAGATACATACAAAGGAGGGCGAGTTTACC 

CTGTCGGCCATGACATCTGGGATTCGGCGGAACTGGATCCAGACCATCATGAAGCACGTGCAC 

CCG AC C ACTGCCC CGG ATGTG ACC AG CTCGT TG C C AG AGG AAAAAAAC AAG AG C AGCTG CT CT 

TTTGAGACCTGCCCGAGGCCTACTGAGAAGCAAGAGGCAGAGCTGGGGGAGCCGGACCCTGAG 

CAGAAGAGGAGCCGCGCACGGGAGCGGAGGCGAGAGGGCCGCTCCAAGACCTTTGACTGGGCT 

GAGTTCCGTCCCATCCAGCAGGCCCTGGCTCAGGAGCGGGTGGGCGGCGTGGGGCCTGCTGAC 

ACCCACGAGCCCCTGCGCCCTGAGGCGGAGCCTGGGGAGCTGGAGCGGGAGCGTGCACGGAGG 

CGGGAGGAGCGCCGCAAGCGCTTCGGGATGCTCGACGCCACAGACGGGCCAGGCACTGAGGAT 

GCAGCCCTGCGCATGGAGGTGGACCGGAGCCCAGGGCTGCCTATGAGCGACCTCAAAACGCAT 

AACGTCCACGTGGAGATTGAGCAGCGGTGGCATCAGGTGGAGACCACACCTCTCCGGGAAGAG 

AAGCAGGTGCCCATCGCGCCCGTCCACCTGTCTTCTGAAGATGGGGGTGACCGGCTCTCCACA 

CACGAGCTGACCTCTCTGCTCGAGAAGGAGCTGGAGCAGAGCCAGAAGGAGGCCTCAGACCTT 

CTGG AG C AG AACCGGCTC CTG C AGG AC C AGC TG AGGG TGGCC CTG GG CCGGG AG C AG AGCG CC 

CGTGAGGGCTACGTGCTGCAGGCCACGTGCGAGCGAGGGTTTGCAGCAATGGAAGAAACGCAC 

CAGAAGAAGATTGAAGATCTCCAGAGGCAGCACCAGCGGGAGCTAGAGAAACTTCGAGAAGAG 

AAAGACCGCCTCCTAGCCGAGGAGACAGCGGCCACCATCTCAGCCATCGAAGCCATGAAGAAC 

GCCCACCGGGAGGAAATGGAGCGGGAGCTGGAGAAGAGCCAGCGGTCCCAGATCAGCAGCGTC 

AACTCGG ATG TTG AG G CC CTG CGGCG CC AGT AC C TGG AG GAG CTG C AGTCGG TG C AG CGGG AA 

CTGG AGG TCCTCTCG G AG C AGT ACTCG C AGAAGTG C CTGG AG AAT GC CC ATCTGG CC C AGG CG 

CTGG AGGC CG AGCGG C AGG CCCTG CGG CAGTGCC AG CGTG AG AAC C AGG AG CT C AATG CCC AC 

AACCAGGAGCTGAACAACCGCCTGGCTGCAGAGATCACACGGTTGCGGACGCTGCTGACTGGG 

GACGGCGGTGGGGAGGCCACTGGGTCACCCCTTGCACAGGGCAAGGATGCCTATGAACTAGAG 
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GTCTT ATTG CGGG T AAAGG AATCGG AAAT AC AGT ACCTG AAAC AGG AG ATT AG CT CCC TC AAG 
GATGAGCTGCAGACGGCACTGCGGGACAAGAAGTACGCAAGTGACAAGTACAAAGACATCTAC 
ACAGAGCTCAGCATCGCGAAGGCTAAGGCTGACTGTGACATCAGCAGGTTGAAGGAGCAGCTC 
AAGGCTGCAACGGAAGCACTGGGGGAGAAGTCCCCTGACAGTGCCACGGTGTCCGGATATGAT 
ATAATGAAATCTAAAAGCAACCCTGACTTCTTGAAGAAAGACAGATCCTGTGTCACCCGGCAA 
CTCAGAAACATCAGGTCCAAGTCCGTAATTGAGCAGGTCTCGTGGGATACCTGA 



ORF Start: ATG at I 



NOV54a, 

CGI 28852-0 1 Protein 
Sequence 



SEQIDNO: 204 



jORF Stop: TGA at 3076 



1025 aa 



MWat 116459.8RD 



MSAAKENPCRKFQANIFNKSKCQNCFKPRESHLLNDEDLTQAKPI YGGWLLLAPDGTDFDNPV 
HRSRKWQRRFFILYEHGLLRYALDEMPTTLPQGTINMNQCTDWDGEGRTGQKFSLCILTPEK 
EHFIRAETKEIVSGWLEMLMVYPRTNKQNQKKKRKVEPPTPQEPGPAKVAVTSSSSSSSSSSS 
IPSAEKVPTTKSTLWQEEMRTKDQPDGSSLSPAQSPSQSQPPAASSLREPGLESKEGESAMSS 
DRMDCGRKVRVESGYFSLEKTKQDLKAEEQQLPPPLSPPSPSTPNHRRSQVIEKFEALDIEKA 
EHMETNAVGPSPSSDTRQGRSEKRAFPRKRDFTNEAPPAPLPDASASPLSPHRRAKSLDRRST 
EPSVTPDLLNFKKGWLTKQYEDGQWKKHWFVLADQSLRYYRDSVAEEAADLDGEIDLSACYDV 
TEYPVQRNYGFQIHTKEGEFTLSAMTSGIRRNWIQTIMKHVHPTTAPDVTSSLPEEKNKSSCS 
FETCPRPTEKQEAELGEPDPEQKRSRARERRREGRSKTFDWAEFRPIQQALAQERVGGVGPAD 
THEPLRPEAE PGELERERARRREERRKRFGMLDATDG PGTEDAALRM EVDRS PGLPMS DLKTH 
NVHVEIEQRWHQVETTPLREEKQVPIAPVHLSSEDGGDRLSTHELTSLLEKELEQSQ^EASDL 
LEQNRLLQDQLRVALGREQSAREGYVLQATCERGFAAMEETHQKKIEDLQRQHQRELEKLREE 
KDRLLAEETAATISAIEAMKNAHREEMERELEKSQRSQISSVNSDVEALRRQYLEELQSVQRE 
LEVLSEQYSQKCLENAHLAQALEAERQALRQCQRENQELNAHNQELNNRLAAEITRLRTLLTG 
DGGGEATGSPLAQGKDAYELEVLLRVKESEIQYLKQEISSLKDELQTALRDKKYASDKYKDIY 
TELSIAKAKADCDISRLKEQLKAATEALGEKSPDSATVSGYDIMKSKSNPDFLKKDRSCVTRQ 
LRNIRSKSVIEQVSWDT 



]3990 bp 



SEQIDNO: 205 



NOV54b, 
CGI 28852-02 
Sequence 



AGCCTGGAGCACTCCTACCAGAGGGTCTCCAGCCAGCTGCAGAGCATGCACACTCTGCTGAGA 



DNA 



GAGAAGGAGGAAGAGCTGGAGCGCATTAAGGAAGCACATGAGAAGGTTCTGGAGAAGAAGGAG 
CAGGACCTCAATGAGGCTTTGGTTAAAATGGTTGCCTTGGGGAGCAGCTTAGAGGAAACAGAA 
ATTAAGCTCCAGGCAAAAGAAGAGATTTTAAGGAAATTTGCAAGTGAATCTCCAAAGGACATG 
GAAGAGCCACGGAGTACCCCTGAAGAGACAGAAAGGGATGGCACTTTGCTCCCAGGCCAACCA 
GTCCAAGCCACTAGGGCACCTCTAGGCCTCCCACACACAAGGCTCGAGGATGAGGACGAGGAC 
CTGGGGG CTC CTCCGGGGG AAG AGT ACGG TG ATGGC AGC CCC AGT AGGG AAG A C AGC ATGGTG 
CCCCCAAAGTCAGTGGAAGTGCTTGACAGGGAGGGCCATCAGCAGGGCACAGCCAAACTCGAC 
CAAGGGGCACCTGGTGTTAAAAGGCAAAGAATCCGGTTCTCCACAATCCAGTGCCAAAGATAC 
ATTCACCCCGAAGGGTCTGAGAAGACCTGGACCAGCAGCACATCTTCCGACACCAGCCAGGAC 
CGGTCACCCTCGGAAGAAAGCATGTCCTCAGAGCCTGCACCCAGTGTACTGCCTGCAACTGGC 
jGACTCTGACACGTACCTCTCCATCATCCACTCCCTGGAGACCAAGCTCTACGTCACAGAGGAA 
AAGCTCAAAGACGTGACCGTGAGGCTGGAGAGCCAGCAGGGTCAGAGCCGTGAGGCACTGCTC 
GCACTGCACCACCAGTGGGCGGGCACCGAGGCCCAGCTGCGTGAGCAGCTCCGCGCCAGCCTG 
CTCCAGGTTGGCGCACTGGCCTCCCAGCTGGAGCAGGAGAGGCAGGAGAGGGCCAGGAGGGTT 
GAAGGGCATGTTGGAGAGCTTGGGGACTTCCAGGTCAAGAATAGTCAGGCCCTGATGTGCCTG 
G AAAATTGCCGAGAACAACTGAGATCTCTGCCT AGGG CC AGCCAGGAGGATGAGC AGG ACGCA 
CGCGCAGCCTCCCTGGCCAGTGTGGAGAGTGCACTCGTCAGCGCCATCCAAGCCCTGCAGCAC 
TG GCCGG CCCCAG C C C ATGG CGGGGCC CGTG CAC AG CTGGAG AC AGGTGG C AC CG AGG AG AAT 
GGGAAGCCTGCCTCCCTGCAGCAGTGCTCCCAGTCTGAGTTGACAGAGCAGGAGCAGGTGAGG 
CTTCTTTCTGACCAGATTGCTCTGGAGGCCTCGCTGATCAGCCAGATAGCAGATTCCCTGAAG 
AACACAACATCAGATGTCTCCCGAATGCTCCATGAGATTTCTTGGTCAGGACAGCCACCGATG 
G AATCTG CTGGGG C CC CCGT AG AC ACCTGGG CC AGG AAGGTC CT AGTGG ATGG TG AGTT CT G G 
AGCCAGGTTGAGTCTCTGAGGAAGCACTTGGGGACACTGGGAGGAGAGGCAGTCGGTGCCTCA 
GGAGACGGGCAGCAAAGCATCCCACAGGGCCTGGCCCCCATCCTGGCCAATGCCACATGGGTC 
AGGGCAGAGCTCAGCTTTGCCACACAGTCAGTGAGGGAGTCGTTCCACCGCAGGCTACAGAGC 
AT CCAGG AG ACCCTGCGGGGC ACC C AG ACGG CC C TG CGG CAG CAC AAATG CCTGCTGAGGG AA 
AT CCTGGG AG C CT ACC AAACCCC AG ACTT TG AAAG AGTG ATG CAG C AGGTCTTG G AAG C CC T C 
AGGCTTCCAGCGGGCCATGAAGATGGTGTTCAGCTGTCCTGGGACCTGAGCCCCTTAGGAGAA 
GTCCTGGGCCGAGACTCAGACAGCTCTCAGGAGCCCTTCGATGTGTCTGACCAGAGCCCTGGG 
GCCTTTGTTGCTATTCAGGAGGAGCTTGCCCAGCAGCTGAAGGAGAAGGCCAGCCTCTTAGAG 
GAG AT AG CGG CTG CCTTACCATCTCTG CC AC CTG TGG AATCG CTG AG AG A TTG CC AG AAGCTT 
CTCCAGGTGTCCCAGAGTCTCTCGTATAACACTTGTTTGGGAGGCCTCGGTCAGTATTCTTCA 
TTGTTGGTTCAGGATGCCATTATTCAGGCCCAGGTTTGCTATGCGTCCTGCAGAATCCGGCTA 
GAATATGAGAAGGAGCTCCAGCTCTGCAAGGAGTCCTGGCAAACCCGGGAGCCCTCCTGCTCA 
GAGCAGGCACAGGCAGCCCGGGCCCTGAGGGAGGAGTATGAGGAGCTTCTCCGCAAGCAGAAG 
AGCGAGTACCTGGATGTGATCGCCATTGTTGAAAGGGAGAATGCAGAGCTCAAGGCCAAGGCC 
GCCCAGCTAGACCATCAGCAGCAGTGTCTGGAGGATGCAGAGAGCAAGCACAGCATGAGCATG 
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TTCACCCTGCGGGGCAGGTATGAGGAGGAGATTCGGTGTGTGGTGGAGCAGCTGACCAGGACC 
GAG AG CAC ACTGC AGG CTG AG CG C AG CCGGGT C CTG AGC C AG CTGG ATGC CT C G G T C AG AG A C 
AGGCAGGACATGGAGAGGCATCATGGTGAGCAGATACAGACCCTGGAGGACAGGTTCCAGCTC 
AAGGTCCGGGAGCTGCAGACGATCCACGAGGAGGAGCTGAGGACCCTGCAGGAGCACTACTCG 
CAGAGCCTGAGGTGCCTTCAGGACACCCTCTGCCTCCACCAGGGGCCACACCCCAAGGCCCTG 
CCAGCCCCTGCCCCCAACTGGCAGGCCACCCAGGGAGAGGCTGACTCCATGACGGGGCTGAGG 
GAGCGCATCCAGGAGCTGGAGGCCCAGATGGATGTCATGCGGGAGGAGCTGGGACACAAGGAC 
CTGGAGGGCGACGCGGCCACACTGCGTGAGAAGTACCAGAGGGACTTGGAGAGCCTTAAGGCC 
ACGTGCGAGCGAGGGTTTGCAGCAATGGAAGAAACGCACCAGAAGAAGATTGAAGATCTCCAG 
AGGCAGCACCAGCGGGAGCTAGAGAAACTTCGAGAAGAGAAAGACCGCCTCCTAGCCGAGGAG 
ACAGCGGCCACCATCTCAGCCATCGAAGCCATGAAGAACGCCCACCGGGAGGAAATGGAGCGG 
GAGCTGGAGAAGAGCCAGCGGTCCCAGATCAGCAGCGTCAACTCGGATGTTGAGGCCCTGCGG 
CGCC AGT ACC TGG AGG AG CTG CAGT CGGTGC AGCGGG AACTGG AGGT CCTCT CGG AG C AG T A C 
TCGCAGAAGTGCCTGGAGAATGCCCATCTGGCCCAGGCGCTGGAGGCCGAGCGGCAGGCCCTG 
CGGCAGTGCCAGCGTGAGAACCAGGAGCTCAATGCCCACAACCAGGAGCTGAACAACCGCCTG 
GCTGCAGAGATCACACGGTTGCGGACGCTGCTGACTGGGGACGGCGGTGGGGAGGCCACTGGG 
TCACCCCTTGCACAGGGCAAGGATGCCTATGAACTAGAGGTCTTATTGCGGGTAAAGGAATCG 
GAAATACAGTACCTGAAACAGGAGATTAGCTCCCTCAAGGATGAGCTGCAGACGGCACTGCGG 
GACAAGAAGTACGCAAGTGACAAGTACAAAGACATCTACACAGAGCTCAGCATCGCGAAGGCT 
AAGGCTGACTGTGACATCAGCAGGTTGAAGGAGCAGCTCAAGGCTGCAACGGAAGCACTGGGG 
G AG AAGTC CCCTG AC AGTGC C ACGGTGT CCGG AT ATG AT AT AATG AAAT CT AAAAG C AACCC T 
GACTTCTTGAAGAAAGACAGATCCTGTGTCACCCGGCAACTCAGAAACATCAGGTCCAAGAGT 
CTGAAGGAAGGCCTGACGGTGCAAGAACGGTTGAAGCTCTTTGAATCCAGGGACTTGAAGAAA 
GACTAGGTGTGTCCCATCCAAGTTGAGCACGCGCCTTCCCCAGCTTGCAGCAGCACACCCCAA 




GCGCTGCTTTTCACCTGTACCTTTGTTTTATTATTATTATTATTATTGCTGTTGTTGTCATCG 




TT AACTG TGGGC ATGG AATG C 






ORF Start: ATG at 46 


|ORP Stop: TAG at 3847 




SEQ ID NO: 206 1267 aa |MW at 142689.7kD 


NOV54b, 

CGI 28852-02 Protein 
Sequence 


MHTLLREKEEELERIKEAHEKVLEKKEQDLNEALVKMVALGSSLEETEIKLQAKEEILRKFAS 
ESPKDMEEPRSTPEETERDGTLLPGQPVQATRAPLGLPHTRLEDEDEDLGAPPGEEYGDGSPS 
REDSMVPPKSVEVLDREGHQQGTAKLDQGAPGVKRQRIRFSTIQCQRYIHPEGSEKTWTSSTS 
SDTSQDRSPSEESMSSEPAPSVLPATGDSDTYLSIIHSLETKLYVTEEKLKDVTVRLESQQGQ 
SREALLALHHQWAGTEAQLREQLRASLLQVGALASQLEQERQERARRVEGHVGELGDFQVKNS 
QALMCLENCREQLRSLPRASQEDEQDARAASLASVESALVSAIQALQHWPAPAHGGARAQLET 
GGTEENGKPASLQQCSQSELTEQEQVRLLSDQIALEASLISQIADSLKNTTSDVSRMLHEISW 
SGQPPMESAGAPVDTWARKVLVDGEFWSQVESLRKHLGTLGGEAVGASGDGQQSIPQGLAPIL 
ANATWVRAELSFATQSVRESFHRRL.QSIQETLRGTQTALRQHKCLLREILGAYQTPDFERVMQ 
QVLEALRLPAGHEDGVQLSWDLSPLGEVLGRDSDSSQEPFDVSDQSPGAFVAIQEELAQQLKE 
KASLLEEIAAALPSLPPVESLRDCQKLLQVSQSLSYNTCLGGLGQYSSLLVQDAIIQAQVCYA 
SCRIRLEYEKELQLCKESWQTREPSCSEQAQAARALREEYEELLRKQKSEYLDVIAIVERENA 
ELKAKAAQLDHQQQCLEDAESKHSMSMFTLRGRYEEEIRCWEQLTRTESTLQAERSRVLSQL 
DASVRDRQDMERHHGEQIQTLEDRFQLKVRELQTIHEEELRTLQEHYSQSLRCLQDTLCLHQG 
PHPKALPAPAPNWQATQGEADSMTGLRERIQELEAQMDVMREELGHKDLEGDAATLREKYQRD 
LESLKATCERGFAAMEETHQKKIEDLQRQHQRELEKLREEKDRLLAEETAATISAIEAMKNAH 
REEMERELEKSQRSQISSVNSDVEALRRQYLEELQSVQRELEVLSEQYSQKCLENAHLAQALE 
AERQALRQCQRENQELNAHNQELNNRLAAEITRLRTLLTGDGGGEATGSPLAQGKDAYELEVL 
LRVKESEIQYLKQEISSLKDELQTALRDKKYASDKYKDIYTELSIAKAKADCDISRLKEQLKA 
ATEALGEKSPDSATVSGYDIMKSKSNPDFLKKDRSCVTRQLRNIRSKSLKEGLTVQERLKLFE 
SRDLKKD 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 54B. 



Table 54B. Comparison of NOV54a against NOV54b. 


— — , 


Protein Sequence 


NOV54a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 




NOV54b 


710..1019 
939.. 1248 


240/310(77%) 
244/310(78%) 
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Further analysis of the NOV54a protein yielded the following properties shown in ' 
Table 54C. 



Table 54C. Protein Sequence Properties NOV54a 


PSort 
analysis: 


0.7000 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1 000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 
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A search of the NOV54a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 54D. 



Table 54D. Geneseq Results for NOV54a 


Geneseq 
Identifier 


Protein/Organism/Length (Patent #, 
Date] 


NOV54a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


ABB57200 


Mouse ischaemic condition related 
protein sequence SEQ ID NO:488 - 
Mus musculus, 1024 aa. 
[ WO200 1881 88-A2, 22-NOV-200 1 ] 


] ..1024 
1 ..1023 


937/1027 (91%) 
967/1027(93%) 


0.0 


AAM40793 


Human polypeptide SEQ ID NO 5724 - 
Homo sapiens, 612 aa. 
[WO200153312-A1, 26-JUL-200I] 


347..9I9 
1..573 


557/573 (97%) 
559/573 (97%) 


0.0 


ABG10511 


Novel human diagnostic protein 
#10502 - Homo sapiens, 448 aa. 
[WO2001 75067- A2, II-OCT-2001] 


591. .1019 1 
I ..429 


419/429 (97%) 
423/429(97%) 


0.0 


ABG10512 


Novel human diagnostic protein 
#10503 - Homo sapiens, 1201 aa. 
[ WO200 1 75067-A2, 1 1 -OCT-200 1 ] 


710..1019 ! 
871..1 182 : 


286/312(91%) 
291/312(92%) 


e-152 


AAB4I888 


Human ORFX ORF1652 polypeptide 
sequence SEQ ID NO:3304 - Homo 
sapiens, 233 aa. [WO200058473-A2, 
05-OCT-2000] 


342..574 
1..233 


232/233 (99%) 
232/233 (99%) 


e-137 
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In a BLAST search of public sequence datbases, the NOV54a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 54E. 
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Table 54E. Public BLASTP Results for NOV54a 


j Protein 
! Accession 
Number 


Pro te i n/O rga n is m/Len gt h 


NOV54a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


! 
] 

S Expect 
j Value 


CAD39169 


Hypothetical protein - Homo sapiens 
(Human), 987 aa (fragment). 


38.. 1025 
1..987 


984/988 (99%) 
984/988 (99%) 


0.0 


P97434 


Rho-interacting protein 3 (pi I6RIP) 
(R1P3) - Mus muscuius (Mouse), 1024 
aa. 


1 ..1024 
1 ..1023 


937/1027 (91%) 
967/1027(93%) 


0.0 


Q9ERE6 


Rho-interacting protein 3 (pi 16RIP) 
(RIP3) - Rattus norvegicus (Rat), 1029 
aa. 


] ..1024 
I .1028 


934/1032 (90%) 
968/1032(93%) 


0.0 


Q96G40 


Unknown (Protein for 
IMAGE:4 1 2 1 355) - Homo sapiens 
(Human), 845 aa (fragment). 


156.1019 
3. .826 


820/864 (94%) 
822/864 (94%) 


0.0 


BAC03851 


CDNA FLJ34968 fis, clone 
NTONG2004844, highly similar to . 
Pi 16 RHO-INTERACTING 
PROTEIN - Homo sapiens (Human), 
586 aa. 


208..793 
1..586 


585/586 (99%) 
585/586 (99%) 


0.0 



PFam analysis predicts that the NOV54a protein contains the domains shown in the 
Table 54F. 



Table 54F. Domain Analysis of NOV54a 


Pfam Domain 


NOV54a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


PH 


44..145 


24/102 (24%) 
82/102 (80%) 


1.9e-12 


PH 


388..483 


31/97 (32%) 
73/97 (75%) 


3.1e-20 



5 



Example 55. 

The NOV55 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 55 A. 



Table 55A. NOV55 Sequence Analysis 



SEQIDNO: 207 



669 bp j 



NOV55a, 



jATGGCGGATGGGAGCAGCGATGCGGCTAGGGAACCTCGCCCTGCACCAGCCCCAATCAGACGC 
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CGI 32650-01 DNA 
Sequence 


CGCTCCTCCAACTACCGCGCTTATGCCACGGAGCCGCACGCCAAGAAAAAATCTAAGATCTCC 
GCCTCGAGAAAATTGCAGCTGAAGACTCTGCTGCTGCAGATTGCAAAGCAAGAGCTGGAGCGA 
G AGG CGG AGG AGCGGCGCGG AG AG AAGGGG CGCG CTCTG AG C AC C CG CTG CC AG C CG CTGG AG 
TTGGCCGGGCTGGGCTTCGCGGAGCTGCAGGACTTGTGCCGACAGCTCCACGCCCGTGTGGAC 
AAGGTGGATGAAGAGAGATACGACATAGAGGCAAAAGTCACCAAGAACATCACGGAGATTGCA 
GATCTGACTCAGAAGATCTTTGACCTTCGAGGCAAGTTTAAGCGGCCCACCCTGCGGAGAGTG 
AGGATCTCTGCAGATGCCATGATGCAGGCGCTGCTGGGGGCCCGGGCTAAGGAGTCCCTGGAC 
CTGCGGGCCCACCTCAAGCAGGTGAAGAAGGAGGACACCGAGAAGGAAAACCGGGAGGTGGGA 
GACTGGCGGAAGAACATCGATGCACTGAGTGGAATGGAGGGCCGCAAGAAAAAGTTTGAGAGC 
AAGAAAAAGAAAAAGAAACATCACCATCACCATGACTGA 




ORF Start: ATG at 1 { 


ORF Stop: TGA at 667 




SEQ ID NO: 208 |222 aa )M W at 25577. 1 kD 


NOV55a, 

CGI 32650-01 Protein 
Sequence 


MADGSSDAAREPRPAPAPIRRRSSNYRAYATEPHAKKKSKISASRKLQLKTLLLQIAKQELER 
EAEERRGEKGRALSTRCQPLELAGLGFAELQDLCRQLHARVDKVDEERYDIEAKVTKNITEIA 
DLTQKIFDLRGKFKRPTLRRVRISADAMMQALLGARAKESLDLRAHLKQVKKEDTEKENREVG 
DWRKNIDALSGMEGRKKKFESKKKKKKHHHHHD 




SEQ ID NO: 209 803 bp 




NOV55b, 

CGI 32650-05 DNA 
Sequence 


r rCAAGCAGr > CCGGAGGAGACTGACGGTCCCTGGGACCCTGAAGGTCACCCGGGCGCiCcrcr r rr 


ACTGACCCTCCAAACGCCCCTGTCCTCGCCCTGCCTCCTGCCATTCCCGGCCTGAGTCTCAGC 


ATGGCGGATGGGAGCAGCGATGCGGCTAGGGAACCTCGCCCTGCACCAGCCCCAATCAGACGC 
CGCTCCTCCAACTACCGCGCTTATGCCACGGAGCCGCACGCCAAGAAAAAATCTAAGATCTCC 
GCCTCGAGAAAATTGCAGCTGAAGACTCTGCTGCTGCAGATTGCAAAGCAAGAGCTGGAGCGA 
GAGGCGGAGGAGCGGCGCGGAGAGAAGGGGCGCGCTCTGAGCACCCGCTGCCAGCCGCTGGAG 
TTGGCCGGGCTGGGCTTCGCGGAGCTGCAGGACTTGTGCCGACAGCTCCACGCCCGTGTGGAC 
AAGGTGGATGAAGAGAGATACGACATAGAGGCAAAAGTCACCAAGAACATCACGAAGATCTTT 
GACCTTCGAGGCAAGTTTAAGCGGCCCACCCTGCGGAGAGTGAGGATCTCTGCAGATGCCATG 
ATGCAGGCGCTGCTGGGGGCCCGGGCTAAGGAGTCCCTGGACCTGCGGGCCCACCTCAAGCAG 
GTG AAG AAGG AGG AC ACCG AG AAGG AAAACCGGG AGGTGGG AG ACTGG CGG AAG AAC AT CG AT 
GCACTGAGTGGAATGGAGGGCCGCAAGAAAAAGTTTGAGAGCTGAGCCTTCCTGCCTACTGCC 
CCTGCCCTGAGGAGGGCCACTGAGGAATAAAGCTTCTCTCTGAGCTG 




ORF Start: ATG at 1 27 j jORF Stop: TGA at 736 




SEQ ID NO: 210 ]203 aa JmW at 23236.4kD 


NOV55b, 

CGI 32650-05 Protein 
Sequence 


MADGSSDAAREPRPAPAPIRRRSSNYRAYATEPHAKKKSKISASRKLQLKTLLLQIAKQELER 
EAEERRGEKGRALSTRCQPLELAGLGFAELQDLCRQLHARVDKVDEERYDI EAK VTKN ITKIF 
DLRGKFKRPTLRRVRISADAMMQALLGARAKESLDLRAHLKQVKKEDTEKENREVGDWRKNID 
ALSGMEGRKKKFES 




SEQ ID NO: 211 |689 bp { 


NOV55c, 

CGI 32650-02 DNA 
Sequence 


CCGGAGGAGACTGACGGTCCCTGGGACCCTGAAGGTCACCCGGGCGGCCCCCTCACTGACCCT ^ 


CCAAACGCCCCTGTCCTCGCCCTGCCTCCTGCCATTCCCGGCCTGAGTCTCAGCATGGCGGAT 


GGG AG C AG CG ATG CGGCT AGGG AACC TCG C CCTG C ACC AG CC C C AATC AG ACG C CGCTCCTC C 
AACTACCGCGGAGAGAAGGGGCGCGCTCTGAGCACCCGCTGCCAGCCGCTGGAGTTGGCCGGG 
CTGGG CTTCGCGG AGCTGC AG G ACTTGTGCCG AC AG CT C C ACG C C CG TGTGG AC AAGGTGG AT 
GAAGAGAGATACGACATAGAGGCAAAAGTCACCAAGAACATCACGGAGATTGCAGATCTGACT 
C AG AAG ATCTTTG ACC TT CG AGG C AAGTTT AAG CGG C C C ACC CTG CGG AG AGT G AGG AT CTCT 
GCAGATGCCATGATGCAGGCGCTGCTGGGGGCCCGGGCTAAGGAGTCCCTGGACCTGCGGGCC 
C ACCTC AAG C AGG TG AAG AAGG AGG AC AC CG AG AAGG AAAAC CGGG AGGTGGG AG ACTGG C G C 
AAGAACATCGATGCACTGAGTGGAATGGAGGGCCGCAAGAAAAAGTTTGAGAGCTGAGCCTTC 
CTGCCTACTGCCCCTGCCCTGAGGAGGGCCCTGAGGAATAAAGCTTCTCTCTGAGCTGA 




ORF Start: ATG at 118 1 

I . x 


ORF Stop: TGA at 622 




SEQ ID NO: 212 J 168 aa |MW at l9133.6kD 


NOV55c, 

CGI 32650-02 Protein 
Sequence 


MADGSSDAAREPRPAPAPIRRRSSNYRGEKGRALSTRCQPLELAGLGFAELQDLCRQLHARVD 
KVDEERYDIEAKVTKNITEIADLTQKIFDLRGKFKRPTLRRVRISADAMMQALLGARAKESLD 
LRAHLKQVKKEDTEKENREVGDWRKNIDALSGMEGRKKKFES 




SEQJDNO:213 776 bp 


NOV55d, 


CTGAAGGTCACCCGGGCGGCCCCCTCACTGACCCTCCAAACGCCCCTGTCCTCGCCCTGCCTC 


CGI 32650-03 DNA 
Sequence 


CTGCCATTCCCGGCCTGAGTCTCAGCATGGCGGATGGGAGCAGCGATGCGGCTAGGGAACCTC 
GCCCTGCACCAGCCCCAATCAGACGCCGCTCCTCCAACTACCGCGCTTATGCCACGGAGCCGC 
ACGCCAAGAAAAAATCTAAGATCTCCGCCTCGAGAAAATTGCAGCTGAAGACTCTGCTGCTGC 
AG ATTGC AAAG C AAG AGC TGG AGCG AG AG G CGG AGG AGCGGCGCGG AG AG AAGGGG CG CG C T C 
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f 

t 
I 

1 
1 

1 

i 

i 


TGAGCACCCGCTGCCAGTCGCTGGAGTTGGCCGGGCTGGGCTTCGCGGAGCTGCAGGACTTGT 
GCCGACAGCTCCACGCCCGTGTGGACAAGGTGGATGAAGAGAGATACGACATAGAGGCAAAAG 
TCACCAAGAACATCACGAAGATCTTTGACCTTCGAGGCAAGTTTAAGCGGCCCACCCTGCGGA 
GAGTGAGGATCTCTGCAGATGCCATGATGCAGGCGCTGCTGGGGGCCCGGGCTAAGGAGTCCC 
TGGACCTGCGGGCCCACCTCAAGCAGGTGAAGAAGGAGGACACCGAGAAGGAAAACCGGGAGG 
TGGGAGACTGGCGCAAGAACATCGATGCACTGAGTGGAATGGAGGGCCGCAAGAAAAAGTTTG 
AGAGCTGAGCCTTCCTGCCTACTGCCCCTGCCCTGAGGAGGGCCCTGAGGAATAAAGCTTCTr 


TCTGAGCTGAAAAAAAAAAA 


! _._ 


ORF Start: ATG at 90 




ORF Stop: TGA at 699 


i 


SEQ ID NO: 214 j203 aa jMW at 23226.3RD 


|NOV55d, 

|CG 132650-03 Protein 
jSequence 


MADGSSDAAREPRPAPAPIRRRSSNYRAYATEPHAKKKSKISASRKLQLKTLLLQIAKQELER 
EAEERRGEKGRALSTRCQSLELAGLGFAELQDLCRQLHARVDKVDEERYDIEAKVTKNITKIF 
DLRGKFKRPTLRRVRISADAMMQALLGARAKESLDLRAHLKQVKKEDTEKENREVGDWRKNID 
ALSGMEGRKKKFES 


i~ 


SEQ ID NO: 215 [855 bp ] 


NOV55e, 

CGI 32650-04 DNA 
Sequence 

I 


CGGCCGCGTCGACCCGGAGGAGACTGACGGTCCCTGGGACCCTGAAGGTCACCCGGGCGGCCC 


CCTCACTGACCCTCCAAACGCCCCTGTCCTCGCCCTGCCTCCTGCCATTCCCGGCCTGAGTCT 


CAGCATGGCGGATGGGAGCAGCGATGCGGCTAGGGAACCTCGCCCTGCACCAGCCCCAATCAG 
ACGCCGCTCCTCCAACTACCGCGCTTATGCCACGGAGCCGCACGCCAAGAAAAAATCTAAGAT 
CTCCG CC TCG AG AAAATTGC AG C TG AAG ACTCTG CTG CTGC AG ATTG CAAAGC AAG AG CTG G A 
GCGAGAGGCGGAGGAGCGGCGCGGAGAGAAGGGGCGCGCTCTGAGCACCCGCTGCCAGCCGCT 
GGAGTTGGCCGGGCTGGGCTTCGCGGAGCTGCAGGACTTGTGCCGACAGCTCCACGCCCGTGT 
GGACAAGGTGGATGAAGAGAGATACGACATAGAGGCAAAAGTCACCAAGAACATCACGGAGAT 
TGCAGATCTGACTCAGAAGATCTTTGACCTTCGAGGCAAGTTTAAGCGGCCCACCCTGCGGAG 
AGTG AGG ATCTCT GC AG ATG C C ATG ATG C AGGCG C TG CTGGGGG CCCGGG CT AAGG AGT CC CT 
GG ACCTG CGGG CC CACCTC AAG C AGG TG AAG AAGG AGG ACACCG AG AAGG AAAAC CGG G AGGT 
GG GAG AC TGG CGC AAG AACAT CG ATG C ACTG AGTGG AAT GG AGGG CCGC AAG AAAAAG TTT G A 
G AGCT G AGCCTTC CTG C CT ACTG CCC CTG CC CTG AGG AGGGCC CTG AGG AAT AAAG C T T CT C T 


CTGAG CTGAAAAAAAAAAAAAAAAAAAC C C AAAAAA 




ORF Start: ATG at 131 




ORF Stop: TGA at 761 




SEQ ID NO: 216 |2l0aa MW at 24007.2kD 


NOV55e, 

CGI 32650-04 Protein 
Sequence 


MADGSSDAAREPRPAPAPIRRRSSNYRAYATEPHAKKKSKISASRKLQLKTLLLQIAKQELER 
EAEERRGEKGRALSTRCQPLELAGLGFAELQDLCRQLHARVDKVDEERYDIEAKVTKNITEIA 
DLTQKIFDLRGKFKRPTLRRVRISADAMMQALLGARAKESLDLRAHLKQVKKEDTEKENREVG 
DWRKNIDALSGMEGRKKKFES 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 55B. 



Table 55B. Comparison of NOV55a against NOV55b through NOV55e. 


Protein Sequence 


NOV55a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV55b 


1..210 
I..203 


154/210(73%) 
154/210(73%) 


NOV55c 


69..210 
27.. 168 


142/142(100%) 
142/142(100%) 


NOV55d 


1..2I0 
1..203 


153/210(72%) 
153/210(72%) 


NOV55e 


1..210 
1..210 


161/210(76%) 
161/210(76%) 
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Further analysis of the NOV55a protein yielded the following properties shown in 
Table 55C. 



Table 55C. Protein Sequence Properties NOV55a 


PSort 
analysis: 


0.9855 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



5 A search of the NOV55a protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 55D. 



, , 

! Table 55D. Geneseq Results for NOV55a 



Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date) 


NOV55a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAW41570 


Modified human cardiac troponin I 
HcTnI-K6-H5-D - Homo sapiens, 222 
aa. [ W09739 132-A1, 23-OCT- 1 997] 


1..222 
1..222 


222/222(100%) 
222/222 (100%) 


e-124 


AAY03174 


Recombinant human troponin 1 - 
Homo sapiens, 226 aa. [W098542 19- 
A 1,03 -DEC- 1998] 


I..22I 
8..225 


214/221 (96%) 
214/221 (96%) 


e-1 17 


AAY03168 


Recombinant Human cardiac troponin 

1 - Homo sapiens, 226 aa. 

[ W098542 1 8-A 1 , 03-DEC- 1 998] 


I..22I 
8..22S 


214/221 (96%) 
214/221 (96%) 


e-l 17 


AAW 18054 


Recombinant human myofibrillar 
contractile protein Troponin 1 - 
Synthetic, 226 aa. [W097 1 9955-A 1 , 
05-JUN-1997] 


1..221 
8..22S 


214/221 (96%) 
214/221 (96%) 


e-1 17 


AAW4I573 


Modified human cardiac troponin 1 
HcTnl-(HL)3 - Homo sapiens, 2 1 6 aa. 
[W09739132-A1, 23-OCT- 1997] 


1..221 
1..215 


213/221 (96%) 
213/221 (96%) 


e-l 15 



10 In a BLAST search of public sequence datbases, the NOV55a protein was found to 

have homology to the proteins shown in the BLASTP data in Table 55E. 
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Tabic 55E. Public BLASTP Results for NOV55a 


Protein 
Accession 
i Number 


Protein/Organisni/Lcngth 


NOV55a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


1 

Expect 
Value 

! 
I 


TPHUIC 


troponin I, cardiac muscle - 
human, 210 aa. 


1 O 1 A 

\..z\v 
1..210 


1 A /O 1 A / 1 A AO/ \ 

1 1 U/Z 10(1 UUro) 
210/210(100%) 


1 J A i 

e- 1 1 4 

i 
! 


PI 9429 


Troponin I, cardiac muscle - 
Homo sapiens (Human), 209 aa. 


2..210 
1..209 


209/209(100%) 
209/209(100%) 


e-l 14 


AAM33343 


Cardiac troponin I - Canis 
familiaris(Dog), 21 1 aa. 


1..209 
1..210 


200/210(95%) 
203/210(96%) 


e-107 


156441 


troponin I - rat, 21 1 aa. 


1..209 
1..210 


195/210(92%) 
202/210(95%) 


e-l 06 


A53805 ! troponin I, cardiac - mouse, 21 1 
jaa. 


I..209 
1-210 


195/210(92%) 
201/210(94%) 


e-l 06 



PFam analysis predicts that the NOV55a protein contains the domains shown in the 
Table 55F. 



Table 55F. Domain Analysis of NOV55a 


Pfam Domain 


NOV55a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


Troponin 


46.. 177 


77/190 (41%) 
123/190 (65%) 


8.6e-59 
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Example 56. 

The NOV56 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 56A. 



Table 56A. NOV56 Sequence Analysis 




SEQIDNO:217 


545 bp 


NOV56a, 

CGI 33808-01 DNA 
Sequence 


GGCGACTGGCTGCACGCGCCGCTGGAGCCCGACGTGCGCGCGGTGGTGGTGGGCTTTGACCCG 


C ACTT C AG CT AC ATG AAG CT C AC C AAGGCC C TGCG CT ACCTG C AG C AG CCCGGCTGCCTGCTC 
GTGGGCACCAACATGGACAACCGGCTTCCGCTTGAGAACGGCCGCTTCATCGCGGGTACCGGG 
TGTCTGGTCCGAGCCGTGGAGATGGCCGCCCAGCGCCAGGCCGACATCATCGGGAAGCCCAGC 
CGC TTCATTTTCG ACTGCGTG TCCC AGG AAT ACGGC ATC AACC CCG AG CG C AC CG T C ATGG TG 
GGAGACCGCCTGGACACAGACATCCTCCTAGGCGCCACCTGTGGCCTGAAGACCATCCTGACC 
CTCACCGGAGTCTCCACTCTAGGGGATGTGAAGAATAATCAGGAAAGTGACTGCGTGTCTAAG 
AAGAAAATGGTCCCTGACTTCTATGTTGACAGCATAGCCGACCTTTTGCCTGCCCTTCAAGGT 
TAAAGATTGAGTGTCTTTAATCTGCAGAATAAAAAAAAAAA 




ORF Start: ATG at 76 ORF Stop: TAA at 505 




SEQIDNO:218 


143 aa 


MWat 15565.9kD 
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NOV56a, 

jCG 133808-0 1 Protein 
^Sequence 



MKLTKALRYLQQPGCLLVGTNMDNRLPLENGRFI AGTGCLVRAVEMAAQRQADI IGKPSRFI F 
DCVSQEYGINPERTVMVGDRLDTDILLGATCGLKTILTLTGVSTLGDVKNNQESDCVSKKKMV 
PDFYVDS I ADLLPALQG 



Further analysis of the NOV56a protein yielded the following properties shown in 
Table 56B. 



1 - 

Table 56B. Protein Sequence Properties NOV56a 


PSort 
analysis: 


0.6500 probability located in cytoplasm; 0.2184 probability located in lysosome 
(lumen); 0.1000 probability located in mitochondrial matrix space; 0.0000 
probability located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV56a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 56C. 



Table 56C. Geneseq Results for NOV56a 



Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV56a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAE14458 


Human protein phosphatase-8 - Homo 
sapiens, 321 aa. [WO200196546-A2, 
20-DEC-2001] 


1..I43 
179..32I 


143/143 (100%) 
143/143 (100%) 


2e-78 


AAM04137 

1 

j 1 

! 


Peptide #2819 encoded by probe for 
measuring breast gene expression - 
Homo sapiens, 94 aa. [WO2001 57270- 
A2, 09-AUG-2001]' 


37..I30 
1..94 


94/94(100%) 
94/94(100%) 


5e-49 


J AAM28902 

i 

i 

i 
I 


Peptide #2939 encoded by probe for 
measuring placental gene expression - 
Homo sapiens, 94 aa. [WO200 157272- 
A2, 09-AUG-2001] 


37..130 
1..94 


94/94(100%) 
94/94(100%) 


5e-49 


[aAM!6403 

i 


Peptide #2837 encoded by probe for 
measuring cervical gene expression - 
Homo sapiens, 94 aa. [WO200 157278- 
A2, 09-AUG-2001] 


37..I30 
1 ..94 


94/94(100%) 
94/94(100%) 


5e-49 


AAM68596 


Human bone marrow expressed probe 
encoded protein SEQ ID NO: 28902 - 
Homo sapiens, 94 aa. [WO200 157276- 
A2,09-AUG-2001] 


37.. 130 
I..94 


94/94(100%) 
94/94(100%) 


5e-49 



10 
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In a BLAST search of public sequence datbases, the NOV56a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 56D. 



Table 56D. Public BLASTP Results for NOV56a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV56a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9CVY8 


1700012G19Rik protein - Mus musculus 
(Mouse), 224 aa (fragment). 


1..I43 
82..224 


134/143 (93%) 
139/143 (96%) 


le-72 


Q9D6Q5 


17000l2G19Rik protein - Mus musculus 
(Mouse), 122 aa. 


22.. 143 
1..122 


115/122(94%) 
119/122(97%) 


5e-6l 


Q8VD52 


Reg 1 binding protein I - Rattus norvegicus 
(Rat), 204 aa (fragment). 


2..143 
49.. 187 


68/142 (47%) 
96/142 (66%) 


3e-31 

i 


Q9UGY2 


DJ37E 16.5 (Novel protein similar to 
NITROPHENYLPHOSPHATASES from 
VARIOUS ORGANISMS) (Hypothetical 
31.7 kDa protein) - Homo sapiens (Human), 
296 aa. 


2.. 142 
158..295 


68/141 (48%) 
94/141 (66%) 


2e-30 


Q96GD0 


Similar to hypothetical protein dJ37El 6.5 - 
Homo sapiens (Human), 179 aa (fragment). 


2..142 
41..178 


68/141 (48%) 
94/141 (66%) 


2e-30 



5 PFam analysis predicts that the NOV56a protein contains the domains shown in the 

Table 56E. 



Table 56E. Domain Analysis of NOV56a 



Pfam Domain 


NOV56a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 





Example 57. 

10 The NOV57 clone was analyzed, and the nucleotide and encoded polypeptide 

sequences are shown in Table 57A. 



Table 57A. NOV57 Sequence Analysis 




SEQ1DN0:219 


339 bp 




NOV57a, 

CG136288-01 DNA 
Sequence 


CCTGCCCACTTTACCCAGATGTCCAGCAAGGTGGCCATCAACAGTGACATTGGGCAGGCCCTC 


TGGGCAGTGGAGCAGCTCCAGATGGAGGCAGGCATCGACCAAGTGAAGGTGAGGAAGATGGCC 
GCCGATCTGCTGAAGTTCTGCACGGAGCAGGCCAAGAATGACCCCTTCCTTGTGGGCATCCCG 
GCCGCCACCAACTCCTTCAAGGAGAAGAAGCCCTATGCCATCCTATGAACCCAGGGCAATGCC 
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i 

i 


ACCCTGTGGCCTGGGCAAACCAGGGGGCCTCAATAAACATGAAGTGAATACTTCTCAGGGCAT 


GGCTGAGCTGGGCTGAGATGGGAG 




ORF Start: ATG at 19 j ORF Stop: TGA at 235 


1 


SEQIDNO:220 172 aa iMW at 791 1.2kD 


:NOV57a, 

CGI 36288-01 Protein 
[Sequence 


MSSKVAINSDIGQALWAVEQLQMEAGIDQVKVRKMAADLLKFCTEQAKNDPFLVGI PAATNSF 
KEKKPYAIL 



Further analysis of the NOV57a protein yielded the following properties shown in 
Table 57B. 



Table 57B. Protein Sequence Properties NOV57a 


PSort 
analysis: 


0.6500 probability .located in cytoplasm; 0.1000 probability located in mitochondrial 
matrix space; 0.1000 probability located in lysosome (lumen); 0.0000 probability 
located in endoplasmic reticulum (membrane) 


SignaiP 
analysis: 


No Known Signal Sequence Predicted 



5 



A search of the NOV57a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 57C. 



Table 57C. Geneseq Results for NOV57a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV57a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB33831 


Human secreted protein BLAST search 
protein SEQ ID NO: 175 - Homo 
sapiens, 72 aa. [ WO200056753-A 1 , 28- 
SEP-2000] 


1..72 
I. .72 


38/72 (52%) 
52/72 (71%) 


le-15 


A AO 14753 

i 
i 

l 


Human Guanine nucleotide binding 
protein gamma 7 (GNG7) - Homo 
sapiens, 68 aa. [ WO2002 1 8647-A 1 , 07- 
MAR-2002] 


S..72 
1..68 


35/68 (51%) 
52/68 (76%) 


4e-15 


! AAW09416 


Human G protein gamma-7 subunit - 
Homo sapiens, 68 aa. [W09637513-A 1 , 
28-NOV-1996] 


S..72 
1..68 


35/68 (51%) 
52/68 (76%) 


4c- 15 


ABB57442 


Human secreted protein encoding 
polypeptide SEQ ID NO 88 - Homo 
sapiens, 72 aa. [WO200183510-AI, 08- 
NOV-2001] 


1..72 
1..72 


36/72 (50%) 
52/72 (72%) 


7e-15 
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ABB57441 


Human secreted protein encoding 
polypeptide SEQ ID NO 87 - Homo 
sapiens, 72 aa. [WO200I83510-A1, 08- 
NOV-2001] 


1..72 
1..72 


36/72 (50%) 
52/72 (72%) 


7e-l5 


In a BLAST search of public sequence datbases, the NOV57a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 57D. 


Table 57D. Public BLASTP Results for NOV57a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NUVj la 

Residues/ 

Match 

Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


028024 


Guanine nucleotide-binding protein 
G(I)/G(S)/G(0) gamma- 12 subunit ' 
(Gamma-S 1 ) - Bos tatirus (Bovine), 7 1 
aa. 


2..12 
1..71 


37/71 (52%) 
51/71 (71%) 


le-14 


060262 


Guanine nucleotide-binding protein 
G(I)/G(S)/G(0) gamma-7 subunit - 
Homo sapiens (Human), 68 aa. 


5.. 72 
1..68 


35/68(51%) 
52/68 (76%) 


le-14 


AAM12593 


Guanine nucleotide binding protein 
gamma 12 - Homo sapiens (Human), 72 . 
aa. 


1..72 
1..72 


36/72 (50%) 
52/72 (72%) 


2e-14 


AAK61365 


G-protein gamma 7 - Mus musculus 
(Mouse), 68 aa. 


9..72 
5..68 


33/64 (51%) 
50/64 (77%) 


8e-14 


P30671 


Guanine nucleotide-binding protein 
G(I)/G(S)/G(0) gamma-7 subunit 
(Gamma-II) - Bos taurus (Bovine), 68 
aa. 


5..72 
1.68 


33/68 (48%) 
52/68 (75%) 


Se-14 
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PFam analysis predicts that the NOV57a protein contains the domains shown in the 
Table 57E. 



Table 57E. Domain Analysis of NOV57a 


Pfam Domain 


NOV57a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


G-gamma 


11. .65 


23/55 (42%) 
47/55 (85%) 


5.6e-I7 
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Example 58. 

The NOV58 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 58A. 



Table 58A. NOV58 Sequence Analysis 




SEQ ID NO: 221 


]l332 bp 




1NV_/ Y JOd, 

CG136933-01 DNA 
Sequence 


ATAATGGATTCTGAATTAATGCATAGTATAGTAGGAAGCTATCATAAACCTCCAGAAAGAGTA 
TTTGTTCCCTCATTCACCCAGAATGAACCATCTCAGAATTGCCATCCTGCAAACTTAGAAGTT 
ACCTCTCCTAAGATACTTCATAGCCCAAATAGCCAAGCTCTTATTTTAGCCTTAAAAACTCTT 
CAGGAAAAAATTCATCGTTTAGAGCTGGAGAGAACACAAGCTGAAGATAACCTGAACATTCTT 
TCCAGAGAAGCAGCACAGTATAAGAAGGCCTTAGAGAATGAAACAAATGAGAGAAATCTGGCA 
CACCAGGAGCTGATAAAGCAGAAAAAAGATATAAGTATACAGTTAAGCTCAGCCCAGTCTCGT 
TGTACTCTTCTAGAGAAGCAACTAGAATATACAAAGAGAATGGTTCTCAACGTAGAGCGAGAA 
AAGAACATGATCCTAGAACAACAGGCCCAGCTTCAGAGGGAAAAAGAACAAGATCAGATGAAG 
CTGTATGCAAAACTTGAAAAGCTTGATGTCTTAGAAAAAGAGTGTTTCAGACTTACA^CAACT 
CAG AAAACTG C TG AGGAC AAG AT TAAAC ATTT AG AAG AAAAACTT AAGG AAG AAG AAC AT C AG 
CGTAAGCTATTTCAAGACAAAGCTTCTCAGCTTCAAACTGGACTTGAAATCAGTAAAATTATA 
ATGTCTTCAGTTTCAAATCTAAAGCACTCCAAGGAAAAGAAGAAATCTTCAAAGTTTTTGCAG 
ATGAGGCAACATCGTGACCCACATATCCTTCAGAAACCTTTTAACGTGACTGAGACTAGATGT 
CTCCCCAAGCCTTCTAGAACAACTTCCTGGTGTAAAGCTATTCCTCCTGACTCAGAAAAGTCC 
ATTTCCATTTGTGACAATTTATCTGAACTTTTGATGGCAATGCAAGATGAGCTGGACCAAATG 
AGCATGGAGCACCAAGAACTACTGAAACAAATGAAGGAAACTGAAAGTCATTCAGTCTGTGAC 
GACATAGAATGTGAACTAGAGTGTTTACTCAAGAAAATGGAAATTAAAGGAGAACAAATCTCC 
AAACTGAAGAAGCATCAAGACAGTGTAAGAAGGCTTCAGCAAAAAGTTCAAAACTCAAAGATG 
AGTGAAGCTTCAGGTATTCAGCAAGAAGACAGCTACCCTAAAGGATCAAAGAACATAAAAAAT 
AGCCCCAGAAAATGTTTGACTGACACTAACCTTTTTCAGAAAAACAGCAGCTTTCATCCAATA 
CGAGTTCATAATCTTCAAATGAAATTGAGAAGAGATGATATCATGTGGGAACAGTAACAAAAC 
AGCAAAACT 




ORF Start: ATG at 4 


1 J 


ORF Stop: TA A at 1315 




SEQ ID NO: 222 


437 aa jMW at 51 102.1kD 


NOV58a, 

CG136933-01 Protein 
Sequence 


MDSELMHSIVGSYHKPPERVFVPSFTQNEPSQNCHPANLEVTSPKILHSPNSQALILALKTLQ 
EKIHRLELERTQAEDNLNILSREAAQYKKALENETNERNLAHQELIKQKKDISIQLSSAQSRC 
TLLEKQLEYTKRMVLNVEREKNMILEQQAQLQREKEQDQMKLYAKLEKLDVLEKECFRLTTTO 
KTAEDKIKHLEEKLKEEEHQRKLFQDKASQLQTGLEISKIIMSSVSNLKHSKEKKKSSKFLQM 
RQHRDPHILQKPFNVTETRCLPKPSRTTSWCKAIPPDSEKSISICDNLSELLMAMQDELDQMS 
MEHQELLKQMKETESHSVCDDIECELECLLKKMEIKGEQISKLKKHQDSVRRLQQKVQNSKMS 
EASGIQQEDSYPKGSKNIKNSPRKCLTDTNLFQKNSSFHPIRVHNLQMKLRRDDIMWEQ 



5 

Further analysis of the NOV58a protein yielded the following properties shown in 
Table 58B. 



Table 58B. Protein Sequence Properties NOV58a 


PSort 
analysis: 


0,4500 probability located in cytoplasm; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 
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A search of the NOV58a protein against the Geneseq database, a proprietary database 
thai contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 58C 

i Tabic 58C Geneseq Results for NOV58a 



1 

! 

j Geneseq 
! Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV58a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 

i 

I 


jAAM00812 

} 


Human bone marrow protein, SEQ ID 
NO: 175 - Homo sapiens, 294 aa. 
[WO200153453-A2, 26-JUL-2001] 


1..28I 
1..292 


255/292 (87%) 
260/292 (88%) 


e-137 


AAM00925 


Human bone marrow protein, SEQ ID 
NO: 401 - Homo sapiens, 206 aa. 
[ WO200 1 53453-A2, 26-JUL-200 1 ] 


249..436 
I. .188 


186/188 (98%) 
187/188 (98%) 


e-106 


AAB84841 


Human FGF Associated Protein, FAP - 
Homo sapiens, 500 aa. 
[JP2001061477-A, 13-MAR-2001] 


51. .402 
65..462 


138/398 (34%) 
212/398 (52%) 


le-58 


AAB28209 


Novel human protein #7 - Homo 
sapiens, 1 14 aa. [WO200052165-A2, 
08-SEP-2000] 


110..220 
3.. 113 


51/111 (45%) 
80/111 (71%) 


8e-23 


ABGI5517 


Novel human diagnostic protein 
#15508 - Homo sapiens, 662 aa. 
[WO200175067-A2, ll-OCT-2001] 


54..229 
451. .653 


68/203 (33%) 
108/203 (52%) 


2e-20 



5 " ^ 

In a BLAST search of public sequence datbases, the NOV58a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 58D. 

i 1 ■ — ■ — ■ — — — . 



j Table 58D. Public BLASTP Results for NOV58a 



Protein 

Accession 

Number 


Protcin/Organism/Length 


NOV58a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9CWM5 


24100l7P07Rik protein - Mus 
musculus (Mouse), 429 aa. 


1 ..437 
I..429 


309/437 (70%) 
357/437 (80%) 


c-169 


|Q8VDS7 

i 


Similar to RIKENcDNA 
24100I7P07 gene - Mus musculus 
(Mouse), 400 aa. 


I..437 
1 ..400 


289/437 (66%) 
339/437 (77%) 


e-153 


Q9CZE0 


2410017P07Rik protein - Mus 
musculus (Mouse), 353 aa. 


130..437 
50..353 


206/308 (66%) 
246/308 (78%) 


e-110 


Q9D5AI 


4930484D1 IRik protein - Mus 
musculus (Mouse), 256 aa. 


1..248 
1..244 


203/248(81%) 
219/248 (87%) 


e-108 
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j QI4704 


KIAA0092 protein - Homo sapiens ! 5 1 ..402 


137/375 (36%) 


8e-60 




(Human), 474 aa. j 65..436 


212/375 (56%) 





PFam analysis predicts that the NOV58a protein contains the domains shown in the 
Table 58E. 



Table 58E. Domain Analysis of NOV58a 



Pfam Domain 



NOV58a Match Region 



Identities/ 
Similarities 

for the Matched Region 



Expect Value 



Example 59. 

The NOV59 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 59A. 



iTablc 59 A. NOV59 Sequence Analysis 



1045 bp f 



NOV59a, 

CGI 36942-0 1 DNA 
^Sequence 



TCTCCATGACATGTTGATGCTGGCTGAGCAGCAGCAGAAGCAGAAGTGGGCTGTGAATACTCA 



AAACACTGCCTGGAGTAATGCTGATTCTAAATTTGGCCAGAGGATACTAGAGAAGATGGAATG 
GTCTAAAGGAAGGGGTTTAGGGGTTCAGGAGCAAGGAGGCCCAGATGATATTAAAGTTCAAGT 
TAAAAATAACGACCTGGGACTTCAAGCTACAATCAATAATGAAGCCAACTGGATTGCCCATCA 
AGATGATTTTAACTGGCTTCTGGCGGAACTGAACACTTGTCAGAGGCAGGAAACAGCAGACTC 
CTTAG AC AAC AAG AAAAAG AAAT ATTTT AG T CTTG AAG AAATTTC C AAAA T CT TC AAAAAC TG 
TGTTCATCATAGGAAATTTACAAAAGAAAAGGATCTATCATCTCGGAGCAAAACAGATCGTGA 
CTGCATTTTTGGGAAAAAACAGAGTAAGAAGACTCCCGAGGGTAATTCCAGTCCCTCCACTCC 
AGACCAGAACAAAACCACGATGACAACCCATGCCTTCACCATCCAGGAGCGTTTTGCCAAGCG 
AATGGCAGCACTGAAGAACAAGCCCCAGGTTGCAGCTCCAGGGCCTGACATTTCCAAGACCCA 
AGTGGAATGCAAAAGGGGGAAGAAAAGAAACAAAGAGGCAACAGGTAAAAATGGGGAGAGTTA 
CCCCCCAACACAGCCTAAGGCCAAGCGGCCTAAAGAGGGAAAGCCTAAGAGAGACAAGGTCCA 
GAAGTCGGCATCCAAGGAGAAAAGAGCACGGACAGACGGACAGTGCAGAGGCCTCTGCTGGGA 
AGAGAGTTCTGAGGCCTCTGCTCAGGGTGCAGGGAATTGTGTGCAGCCACCTGATGGCCAGGA 
TTTCACCCTGAAGCCCAAAAAGACAAGAGGAAAAAAAAAAGCTGCAAAGCCAGTAGAGGTAGC 
AATGG AC ACT ACG CTG AAAG AAAC ACC AATG AAAAAT AAG AAAAAG AAGAAAG G TT C C AAA TG 
AATTCTCTCCAGCCAGGGCCTTCCGACCACTCAGCTT 



ORF Start: ATG at 1 1 



SEQIDNO:224 



jORF Stop: TGA at 1007" 



aa 



MWat 37378.2kD 



10 



;NOV59a, 

•CGI 36942-01 Protein 
iSequence 



MLMLAEQQQKQKWAVNTQNTAWSNADSKFGQRILEKMEWSKGRGLGVQEQGGPDDIKVQVKNN 
DLGLQATINNEANWIAHQDDFNWLLAELNTCQRQETADSLDNKKKKYFSLEEISKIFKNCVHH 
RKFTKEKDLSSRSKTDRDCIFGKKQSKKTPEGNSSPSTPDQNKTTMTTHAFTIQERFAKRMAA 
LKNKPQVAAPGPDISKTQVECKRGKKRNKEATGKNGESYPPTQPKAKRPKEGKPKRDKVQKSA 
SKEKRARTDGQCRGLCWEESSEASAQGAGNCVQPPDGQDFTLKPKKTRGKKKAAKPVEVAMDT 
TLKETPMKNKKKKKGSK 



Further analysis of the NOV59a protein yielded the following. properties shown in 
Table 59B. 
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| Table 59B. Protein Sequence Properties NOV59a 


jPSort 
j analysis: 

I 


0.9748 probability located in nucleus; 0.1000 probability located in mitochondrial 
matrix space; 0.1000 probability located in lysosome (lumen); 0.0000 probability 
located in endoplasmic reticulum (membrane) 


{SignalP 
j analysis: 


No Known Signal Sequence Predicted 



A search of the NOV59a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 59C. 

5 

Table 59C. Geneseq Results for NOV59a 



i 

Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date! 


NOV59a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 

'> 


AAM78624 


Human protein SEQ ID NO 1286 - 
Homo sapiens, 328 aa. 
[WO200157190-A2, 09-AUG-2001] 


1..332 
1..328 


244/332 (73%) 
275/332 (82%) 


e-133 


AAY32206 


Human receptor molecule (REC) 
encoded by Incyte clone 2825826 - 
Homo sapiens, 352 aa. [WO9957270- 
A2, 1 1 -NOV- 1999] 


1..329 
1..326 


236/329 (71%) 
272/329 (81%) 


e-131 


AAM79608 


Human protein SEQ ID NO 3254 - 
Homo sapiens, 322 aa. 
[WO200157190-A2, 09-AUG-200I] 


I..286 
39..321 


211/286(73%) 
237/286 (82%) 


e-1 15 


ABBI2307 


Human NADH-cytochrome b5 
reductase homologue, SEQ ID 
NO:2677 - Homo sapiens, 322 aa. 
[WO200157188-A2, 09-AUG-2001] 


I..286 
39..321 


211/286(73%) 
237/286 (82%) 


e-l 15 


ABB12190 

i 

! 


Human tumour suppressor homologue, 
SEQ ID NO:2560 - Homo sapiens, 127 
aa. [WO200157188-A2, 09-AUG- 
2001] 


1 .102 
5..I06 


102/102(100%) 
102/102(100%) 


le-55 



In a BLAST search of public sequence datbases, the NOV59a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 59D. 
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Table 59D. Public BLASTP Results for NOV59a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV59a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


1 

Expect 
Value 

! 


Q96BK5 


Pin2-interacting protein XI (TRF1- 
interacting protein 1) (Liver- related 
putative tumor suppressor) (67-1 1-3 
protein) - Homo sapiens (Human), 328 
aa. 


1..332 
I..328 


244/332 (73%) 
275/332 (82%) 


e-133 


Q9CZX5 


Pin2-interacting protein XI (TRFI- 
interacting protein 1) (Liver- related 
putative tumor suppressor) (LPTS1 ) (67- 
1 3 -3 protein) - Mus musculus (Mouse), 
332 aa. 


1..328 
1..329 


199/333 (59%) 
238/333 (70%) 


e-101 


Q22705 


T23G7.3 protein - Caenorhabditis 
elegans, 339 aa. 


1..247 
1..262 


82/268 (30%) 
131/268 (48%) 


le-27 


Q9V952 


CG ! 1 1 80 protein - Drosophila 
melanogaster (Fruit fly), 726 aa. 


1..328 
1..382 


99/386 (25%) 
165/386(42%) 


2e-22 


Q9URX9 


Hypothetical 3 1 .9 kDa protein C890.05 
in chromosome I - Schizosaccharomyces 
pombe (Fission yeast), 284 aa. 


4..2S6 
3. .244 


65/258 (25%) 
116/258 (44%) 


3e-ll 



PFam analysis predicts that the NOV59a protein contains the domains shown in the 
Table 59E. 



Table 59E. Domain Analysis of NOV59a 


Pfam Domain 


NOV59a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


G-patch 


26..70 


18/47 (38%) 
33/47 (70%) 


6.8e-07 



5 



Example 60. 

The NOV60 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 60A. 



Table 60A. NOV60 Sequence Analysis 




SEQ ID NO: 225 


935 bp | 


NOV60a, 

CG137017-01 DNA 


ATQTCCAACCGAGTGGTCCGCTGGGAAGCCAGCCACTCTGGGAACTGGTACACAGCCTCAGGA 
CCACAGCTGAATGCACAGCTAGAAGGTTGGCTTTCACAAGTAAAGTCTACAAAAAGACCTGCT 
AGAGCCATTATTGCACCCCATGCAGGATATACGTACTGTGGGTCTTGTGCCACCCATGCTTAT 
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Qpnnpnrp 


AAACAAGTGGATCCGTCTATTACCCAGGGAATTTTCATCCTTGGGCCTTCTCATCATGTGTCC 
CTCTCTCAGTGTGCACTTTCCAGTGTGGATATATATAGGACACCTCTGCATGACCTTCGTATT 
GACCAAAAGATTTACAGAGAACTGTGGAAGACAGGAATGTTTGAACGCATGTCTCTGCAGACA 
GACAAAGATGAACACGGTATTGAAATGCATTTGCCTTGTACAGCTAAAGCCATGGAAATCCAT 
AAGGATGAGTTTACCATTATTCCTGTACTGGTTGGAGCTCTGAGTGAGTCAAAAGAACAGGAA 
TTCGGAAAACTCTTCAGTAAATATCTAGCAGATCCTAGTAATTTCTTTGTGGTTTCTTCTGAT 
TTTTGCCATTGGGGTCAAAGATTCCGTTACAGTTACTATGATGAATCCCAGGGGGAGACTTAT 
AGATCCATTGAACATCTAGATAAAATGGGTATGAGTATTATAGAACAATTAGACCCTGTATCT 
TTTAGCAATTACTTGAGGAAACACCATAATACTATATGTGGAAGACATCCCTTTAGGGTGTTA 
AATGCTATCACAGAGCTCCAGAAGAATGGAAGAAATATGAGCTTTTCCTTTTTGAATTATGCC 
CAGTCAAGCCAGTGTAGAAACTGGCAAGACAGTTCAGTGAGTTACACAACTGGAGCGCTCACG 
GTCCGCTGAAGCTCTGAATCCTCAGGGAGGCCACCTGCACATTCTCATACTCT 




ORF Start: ATG at 1 


{ORF Stop: TGA at 889 




SEQIDNO:226 j296 aa 


MWat 33879.9kD 


NOV60a, 

CG137017-01 Protein 
Sequence 


MSNRWRWEASHSGNWYTASGPQLNAQLEGWLSQVKSTKRPARAIIAPHAGYTYCGSCATHAY 
KQVDPSITQGIFILGPSHHVSLSQCALSSVDIYRTPLHDLRIDQKIYRELWKTGMFERMSLQT 
DKDEHGIEMHLPCTAKAMEIHKDEFTIIPVLVGALSESKEQEFGKLFSKYLADPSiNFFWSSD 
FCHWGQRFRYSYYDESQGETYRSIEHLDKMGMSI IEQLDPVSFSNYLRKHHNTICGRHPFRVL 
NAITELQKNGRNMSFSFLNYAQSSQCRNWQDSSVSYTTGALTVR 



Further analysis of the NOV60a protein yielded the following properties shown in 
Table 60B. 



Table 60B. Protein Sequence Properties NOV60a 


PSort 
analysis: 


0.5297 probability located in microbody (peroxisome); 0.4657 probability located in 
mitochondrial matrix space; 0.1652 probability located in mitochondrial inner 
membrane; 0.1652 probability located in mitochondrial intermembrane space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 
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A search of the NOV60a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 60C. 



Table 60C Geneseq Results for NOV60a 


Geneseq 
Identifier 


Protcin/Organism/Length [Patent #, 
Date] 


NOV60a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


ABG12611 


Novel human diagnostic protein #12602 
- Homo sapiens, 297 aa. 
[WO200I75067-A2, ll-OCT-2001] 


1..295 
1..296 


269/296 (90%) 
278/296 (93%) 


e-157 


ABG 12609 


Novel human diagnostic protein #1 2600 
- Homo sapiens, 297 aa. 
[WO200I75067-A2, ll-OCT-2001] 


1..295 
1..296 


269/296 (90%) 
278/296 (93%) 


e-157 


ABG16429 








e-1 13 



WO 03/023002 
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f 

j 

j 


- Homo sapiens, 256 aa. 
[WO200I75067-A2, 1 l-OCT-2001] 


38..256 


204/219(92%) 




:ABB7I637 

i 

i 


Drosophila melanogaster polypeptide 
SEQ ID NO 4 1703 - Drosoph i la 
melanogaster, 295 aa. [WO200I71042- 
A2.27-SEP-2001] 


10.. 293 
6..29I 


173/286 (60%) 
215/286 (74%) 


le-99 


; AAG37603 

i 

i 
i 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 46262 - Arabidopsis 
thaliana, 291 aa. [EP1033405-A2, 06- 
SEP-2000] 


9..292 
6..286 


131/285 (45%) 
185/285 (63%) 


8e-74 



In a BLAST search of public sequence datbases, the NOV60a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 60D. 



j Table 60D. Public BLASTP Results for NOV60a 



1 

j Protein 
j Accession 
i Number 

1 


Protein/Organism/Length 


NOV60a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


| Q9 Y3 1 6 

! 

! 

I 


Hypothetical protein CGI-27 
(C21orfl9-like protein) - Homo 
sapiens (Human), 297 aa. 


1 ..295 
1..296 


270/296 (91%) 
279/296 (94%) 


e-158 


Q9IVH6 

> 

i 


CGI-27 protein (C21orfl9-like 
protein) - Mus musculus (Mouse), 297 
aa. 


1..295 
I..296 


268/296 (90%) 
279/296 (93%) 


e-157 


Q9VG04 


CG803 1 protein (LP04475P) - 
Drosophila melanogaster (Fruit fly), 
295 aa. 


10..293 
6..291 


173/286 (60%) 
215/286 (74%) 


4e-99 


Q9SIR5 

1 


At2g25280 protein (Hypothetical 32.6 
kDa protein) - Arabidopsis thaliana 
(Mouse-ear cress), 291 aa. 


9..292 
6..286 


131/285(45%) 
185/285 (63%) 


2e-73 


T34398 


hypothetical protein C37C3.8 - 
Caenorhabditis elegans, 302 aa. 


I0..295 
13. .299 


138/289 (47%) 
187/289 (63%) 


4e-72 
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PFam analysis predicts that the NOV60a protein contains the domains shown in the 
Table 60E. 
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; Table 60E. Domain Analysis of NOV60a 


Pfam Domain 


NOV60a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


IUPF0103 


9..292 


95/334 (28%) 
204/334(61%) 


1.7e-70 



Example 61. 

The NOV6I clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 61 A. 

5 



[Table 61 A. NOV61 Sequence Analysis 




SEQIDNO:227 _ 607 b P I 


NOV61a, 

CG137146-01 DNA 
Sequence 


CATTAATAATQTCTTTCATCTGTGAGTGGATCTACAATGGCTTCTACAGTGTGCTCCAGTTTC 
T AGG ACT CTACAAGAAAT CT AG AAAACTTGTT CT CTTGGGTTGGG ACAATG TGG ATG AAAC C A 
TTCTTCCTCATATGCTCAAAGATGGTGGATTGGGCCAACAGGCTCCAACACGACCTCTGCCAT 
CAGCAGAGCTGACAACTGCTAGCCTGGCTTTTACGACTTTTGATCTTCTGGGGCATAAGCAAA 
CACGTTGGTTTTGGAACAGTGATCTCCCAGCAACAAATGGGATTGCCTTTCTGGGTTACTGTG 
CAGATTGTCCTCAGATCCTGGAATCCAAAGAGGAGCTCAATGCTTTAATGACTGATGAAAGAA 
TATCCCATGTGCCAGTCCTTATCTTGATTAGCAAATTGGACAGAACAGGCACAATCAGTGAAG 
AAAAACTCTGTCTGCCATTTGGTCTTTATGGACAGACCACAGGAAAGGGAACTTGTGACCCTG 
AAGGAGCCAAATGCCTTTTCCAAGGAATCGTTCACGTTCAGTGTGTACAACAACAGGGCTATG 
GC AAGGG CTT CTG CTGGTTTG C C C AG T AT ATTG AC TG ATG 




ORF Start: ATG at 9 j jORF Stop: TGA at 603 


\ — • - — • 


SEQIDNO:228 198 aa |lV1W at 22l53.3kD 


iNOV61a, 

|CGI37146-01 Protein 
^Sequence 


MSFICEWIYNGFYSVLQFLGLYKKSRKLVLLGWDNVDETILPHMLKDGGLGQQAPTRPLPSAE 
LTTASLAFTTFDLLGHKQTRWFWNSDLPATNGIAFLGYCADCPQILESKEELNALMTDERISH 
VP VL I LI S KLDRTGT I S EEKLCL P FGLYGQTTG KGTC DPEGAKCL FQGIVHVQ CVQQQGYG KG 
FCWFAQYID 



Further analysis of the NOV61a protein yielded the following properties shown in 
Table 6 IB. 



| Table 61B. Protein Sequence Properties NOV61a 


; PSort 
j analysis: 

! 


0.7480 probability located in microbody (peroxisome); 0.1830 probability located in 
lysosome (lumen); 0.1000 probability located in mitochondrial matrix space; 0.0000 
probability located in endoplasmic reticulum (membrane) 


SignalP 

analysis: 


No Known Signal Sequence Predicted 



10 

A search of the NOV6la protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 61 C. 
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Table 61C. Gencscq Results for NOV61a 


Gencseq 
Identifier 


Protein/Organism/Length | Patent #, 
Date] 


NOV61a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Va I ue 


AAG74282 


Human colon cancer antigen protein 
SEQ ID NO:5046 - Homo sapiens, 20 1 
aa. [WO200I22920-A2, O5-APR-2001] 


i.,198 
4..201 


120/198 (60%) 
138/198 (69%) 


4e-59 


AAU27771 


Human full-length polypeptide sequence 
#96 - Homo sapiens, 198 aa. 
[WO200164834-A2, 07-SEP-2001] 


1..198 
1 .198 


120/198 (60%) 
138/198(69%) 


4e-59 


AAB56779 


Human prostate cancer antigen protein 
sequence SEQ ID NO: 1 357 - Homo 
sapiens, 201 aa. [WO200055I74-A1, 
21-SEP-2000] 


1.198 
4..201 


120/198(60%) 
138/198(69%) 


4e-59 


ABB57346 


Mouse ischaemic condition related 
protein sequence SEQ ID NO:970 - Mus 
rnusculus, 198 aa. [WO200188I88-A2, 
22-NOV-2001] 


1..198 
1 ..198 


118/198 (59%) 
136/198 (68%) 

• 


3c-57 


AAB74777 


Human SARI protein SEQ ID NO:4 - 
Homo sapiens, 198 aa. [CN1274727-A, 
29-NOV-2000] 


1..198 
1..198 


115/198 (58%) 
138/198 (69%) 


le-56 



In a BLAST search of public sequence datbases, the NOV61a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 6 ID. 
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Table 61D. Public BLASTP Results for NOV61a 


Protein 

Accession 

Number 


Protcin/Organism/Length 


NOV61a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


AAM69363 


GTP-binding protein Sara - Homo 
sapiens (Human), 198 aa. 


1-198 
1..198 


120/198 (60%) 
139/198 (69%) 


4e-59 


Q99JZ4 


SARI protein - Mus musculus 
(Mouse), 198 aa. 


1..I98 
1.198 


120/198(60%) 
138/198(69%) 


9e-59 


Q9NR31 


GTP-binding protein SARla (COP1I- 
associated small GTPase) - Homo 
sapiens (Human), 198 aa. 


1 .198 
1 .198 


120/198 (60%) 
138/198(69%) 


le-58 


Q9QVY2 


SAR I B protein promoting vesicle 
budding from the endoplasmic 
reticulum - Cricetulus griseus (Chinese 
hamster), 198 aa. 


1..198 
1..198 


118/198 (59%) 
137/198 (68%) 


4e-58 
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P36536 



| GTP-binding protein SARIa - Mus 


1..I98 


118/198 (59%) j Se-57 


| musculus (Mouse), 198 aa. 


1..I98 


136/198 (68%) 



PFam analysis predicts that the NOV6la protein contains the domains shown in the 
Table 6 IE. 



Table 61E. Domain Analysis of NOV61a 


Pfam Domain 


NOV61a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


arf " 


8..198 


55/193 (28%) 
130/193 (67%) 


le-12 



Example 62. 

The NOV62 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 62A. 



Table 62 A. NOV62 Sequence Analysis 



NOV62a, 

CGI 37566-01 DNA 
Sequence 



SEQ ID NO: 229 



545 bp ~ ] " 



GGAACGTGGCTGGTTGGAGGAGGTAGATCACCCTTTCTGCGGGGGACGATTTCGTCGGTGGTA 



GGCTGCTACCATGAGGTTGAATCAGAACACCTTGCTGCTGGGGAAGAAGGTGGTCCTTGTACC 



CTACACCTCGGAGCATGTGCCCAGGTACCACGAGTGGATGAAATCAGAGGAGCTGCAGCGTTT 
GACAGCCTCGGAGCCGCTGACCCTGGAGCAGGAGTATGCCATGCAGTGCAGCTGGCAGGAAGA 
TGCAGACAAGTGTACCTTCATTGTGCTGGATGCCGAGAAGTGGCAGGCCCAGCCAGGCGCCAC 
CGAAGAGAGCTGCATGGTGGGAGATGTGAACCTCTTCCTCACAGATCTAGAAGACCTCACCTT 
GGGGG AG AT CG AGGT CATG ATTG C AG AG C C C AGCTGC AGGGGT AAGG GCCTTG G C AC TG AG G C 
CGTTCTCGCGATGCTGTCTTACGAAACTTCACTTTGAGCAGGTGGCTACGAGCAGTGTTTTTC 



AGG AGGT G AC CCTC AG ACTGA C AGTGAGTGAGTC CG AG CAT 



ORF Start: ATG at 74 



ORF Stop: TGA at 476 



SEQ ID NO: 230 



134 aa 



MWat 15130.1kD 



NOV62a, 

CG137566-01 Protein 
Sequence 



MRLNQNTLLLGKKWLVPYTSEHVPRYHEWMKSEELQRLTASEPLTLEQEYAMQCSWQEDADK 
CTFIVLDAEKWQAQPGATEESCMVGDVNLFLTDLEDLTLGEIEVMIAEPSCRGKGLGTEAVLA 
MLSYETSL 



SEQ ID NO: 231 



709 bp 



NOV62b, 

CGI 37566-02 DNA 
Sequence 



AGGCTGCTACCATGAGGTTGAATCAGAACACCTTGCTGCTGGGGAAGAAGGTGGTCCTTGTAC 



CCTAC AC CTCGGAGCATGTCCCCAGC AGGT ACCACGAGTGG ATG AAATCAG AGG AGCTGC AG C 
GTTTGACAGCCTCGGAGCCGCTGACCCTGGAGCAGGAGTATGCCATGCAGTGCAGCTGGCAGG 
AAGATGCAGACAAGTGTACCTTCATTGTGCTGGATGCCGAGAAGTGGCAGGCCCAGCCAGGCG 
CCACCGAAGAGAGCTGCATGGTGGGAGATGTGAACCTCTTCCTCACAGATCTAGAAGACCTCA 
C CTTGGGGG AGATCG AGGTC ATG ATTG C AG AG CCC AGCTGCAGGG GT AAGGG C CTTGG C AC TG 
AGGCCGTTCTCGCGATGCTGTCTTACGGAGTGACCACGCTAGGTCTGACCAAGTTTGAGGCTA 
AAATTGGGCAAGGAAATGAACCAAGCATCCGGATGTTCCAGAAACTTCACTTTGAGCAGGTGG 
CTACGAGCAGTGTTTTTCAGGAGGTGACCCTCAGACTGACAGTGAGTGAGTCCGAGCATCAGT 
GGCTTCTGGAGCAGACCAGCCACGTGGAAGAGAAGCCTTACAGAGATGGGTCGGCAGAGCCCT 
GCTGA TGGCTGGGCCTTGTGGGCAGCCACTCTGTGTGAGCAGGGTGTTGGGCCCATACACTTC 
AAAGACCAGAGCCCTG 



ORF Start: ATG at 12 



ORF Stop: TGA at 633 
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SEQ ID NO: 232 j207 aa ]M W at 2336 1 .3kD 


NOV62b, 

CG 137566-02 Protein 
Sequence 


MRLNQNTLLLGKKWLVPYTSEHVPSRYHEWMKSEELQRLTASEPLTLEQEYAMQCSWQEDAD 
KCTFIVLDAEKWQAQPGATEESCMVGDVNLFLTDLEDLTLGEIEVMIAEPSCRGKGLGTfTAVL 
AMLSYGVTTLGLTKFEAKIGQGNEPSIRMFQKLHFEQVATSSVFQEVTLRLTVSESEHQWLLE 
QTSHVEEKPYRDGSAEPC 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 62B. 



j Table 62B. Comparison of NOV62a against NOV62b. 


| Protein Sequence 


NOV62a Residues/ 


Identities/ 


Match Residues 


Similarities for the Matched Region 


I NOV62b 


1..130 


130/131 (99%) 


i 
1 


1..131 


130/131 (99%) 
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Further analysis of the NOV62a protein yielded the following properties shown in 
Table 62C. 



Table 62C Protein Sequence Properties NOV62a 


PSort 
analysis: 

| 


0.6500 probability located in cytoplasm; 0.1000 probability located in mitochondrial 
matrix space; 0.1000 probability located in lysosome (lumen); 0.0000 probability 
located in endoplasmic reticulum (membrane) 


Signal P 
: analysis: 


No Known Signal Sequence Predicted 



1 0 A search of the NOV62a protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 62D. 



r 

Table 62D. Geneseq Results for NOV62a 


i 

! 

i 

j Geneseq 

1 Identifier 

i 

I 


Protein/Organism/Length [Patent #, 
Date] 


NOV62a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM39602 

j 


Human polypeptide SEQ ID NO 2747 

- Homo sapiens, 206 aa. 

[WO200 1 533 1 2-A 1 , 26-JUL-200 1 ] 


1..130 
1..130 


130/130(100%) 
130/130(100%) 


6e-72 


AAU23355 


Novel human enzyme polypeptide 
#441 - Homo sapiens, 209 aa. 
[WO20015530I-A2, 02-AUG-2001] . 


1 ..130 
4..133 


130/130(100%) 
130/130(100%) 


6e-72 


AAM41388 








le-70 
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- Homo sapiens, 210 aa. 

[ WO200 1 533 1 2-A 1 , 26-JUL-200 1 ] 


4..134 j 130/131 (99%) 




; AAB41504 

i 

i 

( 


Human ORFX ORFI268 polypeptide 
sequence SEQ ID NO:2536 - Homo 
sapiens, 207 aa. [WO200058473-A2, 
05-OCT-2000] 


I..130 
1.13 1 


129/131 (98%) 
129/131 (98%) 


le-69 


iAAG03244 

j 


Human secreted protein, SEQ ID NO: 
7325 - Homo sapiens, 89 aa. 
[EP1 03340 1-A2, 06-SEP-2000] 


I..88 
I..88 


88/88(100%) 
88/88(100%) 


2e-47 


In a BLAST search of public sequence datbases, the NOV62a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 62E. 


Table 62E. Public BLASTP Results for NOV62a 


i 

j Protein 
j Accession 

j Number 

j 


Protein/Organism/Length 


NOV62a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


: Q9BTD0 


DKFZP564C1 03 protein - Homo 
sapiens (Human), 206 aa. 


1..130 
I ..130 


130/130(100%) 
130/130(100%) 


2e-71 


| Q9Y3T3 


Hypothetical 23.3 kDa protein - 
Homo sapiens (Human), 206 aa. 


1..130 
1 ..130 


129/130(99%) 
129/130(99%) 


le-70 


, Q9BTE0 


DKFZP564C1 03 protein - Homo 
sapiens (Human), 207 aa. 


1 .. 1 30 
1..131 


130/131 (99%) 
130/131 (99%) 


4e-70 


:Q9DI5I 


1 1 10028N05Rik protein (RIKEN 
cDNA 1 1 10028N05 gene) - Mus 
musculus (Mouse), 241 aa. 


1 .130 
I..I30 


109/130 (83%) 
117/130 (89%) 


2e-58 


| Q9D7G2 

i 


1 1 l0028N05Rik protein - Mus 
musculus (Mouse), 21 1 aa. 


3 1..130 
I..I00 


82/100 (82%) 
87/100 (87%) 


3e-40 
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PFam analysis predicts that the NOV62a protein contains the domains shown in the 
Table 62F. 



Table 62F. Domain Analysis of NOV62a 



! 
1 
t 

i Pfam Domain 


NOV62a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


1 
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Example 63. 

The NOV63 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 63A. 



Table 63A. NOV63 Sequence Analysis 



NOV63a, 

CGI 37707-01 DNA 
Sequence 



SEQIDNO:233 



683 bp 



GGTGTGCTGCCAGAGATTTTGCCTCTTCAAGGTGA ATQCGGCTTCAAGGGGCTATCTTTGTGC 
TCCTGCCCCACCTGGGGCCCATCCTGGTCTGGCTGTTCACTCGTGATCACATGTCTGGTTGGT 
GTGAGGGCCCGAGGATGCTGTCCTGGTGCCCATTCTACAAAGTCTTATTGCTTGTACAGACAG 
CCATCTACTCTGTCGTGGGGTATGCCTCCTACCTGGTGTGGAAGGACCTGGGAGGGGGCTTGG 
GGTGGCCCCTGGCCCTGCCTCTTGGCCTCTATGCTGTTCAGCTCACCATCAGCTGGACTGTCC 
TGGTTCTCTTTTTCACAGTCCACAACCCCGGCCTCTATGCCCAGGCCCTGCTGCACCTGCTGC 
TG CTG T ATGGGCTGGTGGTG AG C AC AGC A CTG AT CTGGC ATC CC ATC AAC AAACT GG C TG CC C 
TGTTACTGCTGCCCTACCTAGCCTGGCTCACCGTGACTTCAGCCCTCACCTACCACCTGTGGA 
GGG AC AG C CTTTGTC C AG TGC AC C AG CCT C AGCC C ACGG AG AAG AGTG AC T G AGGCCCTAGGG 
CATGGG AG AGG AGGG ACG CCC AG GGTGGGG AGG AAG AGT CTGC AAG C AGGG CTGTGG AGT T AG 



GGTTCACCCCAATGGGACCACCCTCCTGGGTCCCCTGGTGCCGTTTTTCCTTA 



ORF Start: ATG at 36 



ORF Stop: TGA at 555 



SEQ ID NO: 234 



173 aa 



MWat 19491. IkD 



NOV63a, 

CGI 37707-01 Protein 
Sequence 



MRLQGAI FVLLPHLGPILVWLFTRDHMSGWCEGPRMLSWCPFYKVLLLVQTAIYSWGYASYL 
VWKDLGGGLGWPLALPLGLYAVQLTISWTVLVLFFTVHNPGLYAQALLHLLLLYGLWSTALI 
WHPINKLAALLLLPYLAWLTVTSALTYHLWRDSLCPVHQPQPTEKSD 



SEQJDNO: 235 



624 bp 



NOV63b, 

CGI 37707-02 DNA 
Sequence 



AG AG ATT TG CCTCTT C AAGGTG AATGCGG CTTC AAGGGG CT ATCTTTGTG CT C CTGCCCCACC 



TGGGGCCCATCCTGGTCTGGCTGTTCACTCGTGATCACATGTCTGGTTGGTGTGAGGGCCCGA 
GGATGCTGTCCTGGTGCCCATTCTACAAAGTCTTATTGCTTGTACAGACAGCCATCTACTCTG 
TCGTGGGCTATGCCTCCTACCTGGTGTGGAAGGACCTGGGAGGGGGCTTGGGGTGGCCCCTGG 
CCCTGCCTCTTGGCCTCTATGCTGTTCAGCTCACCATCAGCTGGACTGTCCTGGTTCTCTTTT 
TCACAGTCCACAACCCTGGTCTGGCCCTGCTGCACCTGCTGCTGCTGTATGGGCTGGTGGTGA 
GCACAGCACTGATCTGGCATCCCATCAACAAACTGGCTGCCCTGTTACTGCTGCCCTACCTAG 
CCTGGCT C ACCGTG A CTT C AG C C CT C A C C T ACC AC C TGTGGAGGG AC AG C CTTTG T C C AG TG C 
ACCAGCCTCAGCCCACGGAGAAGAGTGACTGA GGCCCTAGGGCATGGGAGAGGAGGGACGCCC 
AGGGTGGGGAGGAAGAGTCTGCAAGCAGGGCTGTGGAGTTAGGGTTCACCCCAATGG 



ORF Start: ATG at 24 



jORF Stop: TGA at 534 



SEQ ID NO: 236 



170 aa 



MWat 19128.7kD 



NOV63b, 

CGI 37707-02 Protein 
Sequence 



MRLQGAI FVLLPHLGPILVWLFTRDHMSGWCEGPRMLSWCPFYKVLLLVQTAIYSWGYASYL 
VWKDLGGGLGWPLALPLGLYAVQLTISWTVLVLFFTVHNPGLTVLLHLLLLYGLWSTALIWHP 
INKLAALLLLPYLAWLTVTSALTYHLWRDSLCPVHQPQPTEKSD 



J 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 63B. 



| Table 63 B. Comparison of NOV63a against NOV63b. 



1 

Protein Sequence 


NOV63a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV63b 


1..173 
1 ..170 


130/173 (75%) 
130/173 (75%) 
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Further analysis of the NOV63a protein yielded the following properties shown in 
Table 63C. 



Table 63C. Protein Sequence Properties NOV63a 


PSort 
analysis: 


0.6850 probability located in endoplasmic reticulum (membrane); 0.6400 probability 
located in plasma membrane; 0.4600 probability located in Golgi body; 0.1000 
probability located in endoplasmic reticulum (lumen) 


SignalP 
analysis: 


Cleavage site between residues 59 and 60 j 



5 A search of the NOV63a protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 63 D. 



Table 63D. Geneseq Results for NOV63a 






■s 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date) 


NOV63a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAG89291 


Human secreted protein, SEQ ID NO: 
4 1 1 - Homo sapiens, 1 70 aa. 
[WO200142451-A2, 14-JUN-2001] 


1..173 
I. .170 


170/173 (98%) 
170/173 (98%) 


3e-98 


AAM80031 


Human protein SEQ ID NO 3677 - 
Homo sapiens, 1 70 aa. [ WO2001 57 1 90- 
A2,09-AUG-200I] 


1..173 
1..170 


170/173 (98%) 
170/173 (98%) 


3e-98 


AAM79047 


Human protein SEQ ID NO 1709 - 
Homo sapiens, 170 aa. [WO200157190- 
A2,09-AUG-200t] 


L.173 
1..170 


170/173 (98%) 
170/173 (98%) 


3e-98 


ABB 12039 


Human benzodiazepine receptor-like 
protein homologue, SEQ ID NO:2409 - 
Homo sapiens, 170 aa. [WO200157188- 
A2,09-AUG-2001] 


I ..173 
1..170 


170/173 (98%) 
170/173 (98%) 


3e-98 


ABG2147I 


Novel human diagnostic protein #21462 
- Homo sapiens, 171 aa. 
[WO200175067-A2, il-OCT-2001] 


1 ..173 
I ..171 


125/176 (71%) 
134/176(76%) 


3e-6l 



10 In a BLAST search of public sequence datbases, the NOV63a protein was found to 

have homology to the proteins shown in the BLASTP data in Table 63 E. 
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! Table 63E. Public BLASTP Results for NOV63a 



f ■ — 

. Protein 
j Accession 
: Number 


Protein/Organism/Length 


\y \J V UJU 

Residues/ 

Match 

Residues 


lUtll LIUIM 

Similarities for 
the Matched 
Portion 


Expect 
Value 




DI"?4R?1 1 fivJrwel R7RP 
(Benzodiazapine receptor (Peripheral) 
(MBR, PBR, PBKS, IBP, isoquinoline- 
binding protein)) like protein) - Homo 
sapiens (Human), 170aa. 


I..I70 


1 7D/1 71 (Qko/j\ 

170/173 (98%) 




Q9CRZ8 


25 1 0027D20Rik protein - Mus musculus 
(Mouse), 248 aa (fragment). 


I..166 
87..248 


111/166(66%) 
126/166 (75%) 


6e-58 


! 

1 

; 


r CI 1 jJI ICI al"ljr|JC UCl IZ.UUldx.Cp II 1C ICvCJJlUI 

(PBR) (PKBS) (Isoquinoline- binding 
protein) (IBP) - Bos taurus (Bovine), 169 
aa. 

— 


R 1 SO 

O.. 1 J7 

11. .158 


O I / 1 j£ y\\J /Q) 

82/152 (53%) 




Q96TF6 


DJ5261 14. 1 (benzodiazapine receptor 
(peripheral) (PBR, PKBS, mitochondrial 
benzodiazepine, MBR) (isoform 1)) - 
Homo sapiens (Human), 169 aa. 


8.. 159 
11. .158 


63/152 (41%) 
83/152 (54%) 


le-23 


Q99M32 


i 

Similar to benzodiazepine receptor, 
peripheral - Mus musculus (Mouse), 169 
aa. 


10.. 159 
I3..158 


60/150 (40%) 
81/150(54%) 


3e-23 



PFam analysis predicts that the NOV63a protein contains the domains shown in the 
Table 63 F. 



Table 63 F. Domain Analysis of NOV63a 






Identities/ 




Pfam Domain 


NOV63a Match Region 


Similarities 


Expect Value 






for the Matched Region 




TspO_MBR 


1..157 


72/160 (45%) 
148/160 (92%) 


4.5e-66 
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Example 64. 

The NOV64 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 64A. 



( -- — • ■ ' ■ ■ ■ ■ ■ — *— ' ■ 1 ■ 

•Table 64A. NOV64 Sequence Analysis 


i 


SEQ ID NO: 237 ]2560 bp 


NOV64a, 


TTGGGAGTCTGAGAAGTCACCACCATGAAGTTGTTCGGCTTCAGGAGCCGCAGGGGCCAGACG 
GTCCTGGGCTCCATAGACCACCTCTACACGGGTTCCGGGTACCGAATCCGGTACTCGGAACTG 



333 



WO 03/023002 



• 



CT/US02/28539 



CGI 38033-0 1 DNA 
Sequence 



CAGAAGATCCACAAGGCAGCTGTCAAGGGCGACGCTGCGGAGATGGAGCGCTGTCTGGCGCGC 

AGGAGCGGAGACCTTGACGCCCTGGACAAGCAGCACAGGACTGCTCTACATTTGGCCTGTGCC 

AGTGGCCATGTGAAAGTGGTCACTCTCCTGGTTAACAGAAAATGCCAGATTGATATCTATGAC 

AAAGAAAATAGAACGCCTTTGATACAGGCTGTCCATTGCCAGGAAGAGGCTTGTGCCGTTATT 

CTGCTGGAACATGGTGCCAATCCAAACCTTAAGGATATCTACGGCAACACTGCTCTCCATTAT 

GCTGTGTATAGTGAGAGCACCTCACTGGCAGAAAAACTGCTTTTCCATGGTGAAAATATTGAA 

GCACTGGACAAGGACAGTAATACCGCACTTTTATTCGCTATAATTTGCAAGAAAGAGAAAATG 

GTGGAATTTTTATTGAAAAACAAAGCAAGTACACATGCCGTTGATAGGCTGAGAAGAACAGCC 

CT C ATGCTTGCTG TG C ACTATG ACTCACTGG GTATTG TC AAC ATCCTTCT TAAG C AAAGTATT 

AATGTCTTTACTCAAGACATGTGTGGACGAGATGCAGAAGATTACGCTATTTCTTGCCGTTTG 

AC AAAG ATTC AAC AAC AAATT T TGG AGC ATAAAAAG ATG AT ACTT AAAAATG AC AAAAT AG AT 

GTTGGAAGTTCTGATGAATCTGCAGTCAGCATTTTCCATGAACTGTGTGTGGATTCATTGTCT 

GC ATTGG ATG ACG AACTCTTG AG TGTTG CTG C TAAG C AG TGTGTC C C CG AG AAAG TG T C AG AG 

CC TTTACGTGG AC CTT CC CAT GG AAAAGG AAAC AG AAT AGT C AATGG AAAAGG AG AAG GTC CT 

CCTGCAAAACATCCTTCCTTGAAGCCTAGCACTGAAATGGAAGATCCTGCTGTGAAAGGAGCA 

GTACAAAAAAGAATGTACATGAATTTGTCAACAGAACAAGCCTTACCAGTGGCTTCAGAGGAA 

GAACAGCAAAGGCGTGAAAGAAGTGAAAAGAAGCAACCACAGGTATATGAAGGAAATAATACA 

T ACAAAAGTG AAAAAAT ACAACT ATC AG AAAAT ATATGT C AT AGT AC ATTT T CTG C TG CTG CT 

GACAGATTAACCCAACAAAGAAAGATTGGGAAAACATATCCTCAGCAATTTCCCAAGAAACTG 

AAGGAAGAGCATGATAGGTGCACCTTAAAACAAGAAAATGAAGAAAAAACAAATGTTAATATG 

CTGCACAAAAAAAATCGAGAAGAATTAGAAAGGAAAGAGAAACAATATAAGAAAGAAGTTGAA 

GC AAAAC AACTTG AACC AACTGTTCAATC ACTAG AG ATG AAACC AAAG AC TG C AAG AAAT ACT 

CCAAATCAGGATTTTCATAATCATGAAGAAGTGAAAGATCTGATGGATGAAAATTGCATTTTG 

AAGACAGATATTGCTATACTCAGACAGGAAATATGCACAATGAAAAATGACAACCTGGAAAAA 

G AAAAT AAAT ATCTT AAG G AC AT T AAAAT TG CT AAAG AAAC AAATGC TG C C C TTG AAAAGTGT 

ATAAAACTCAATGAGGAAATGATAACAAAAACAGCATTCTGGTATCAACAAGAGCTTAATGAT 

CTCAAAGCTGAGAATACAAGGCTCAATTCTGAACTGTTGAAGGAAAAAGAAAGCAAGAAAAAA 

CTGGAAGCTGAAATTGAATCTTATCAGTCTAGACTGGCTGCTGCTATAAGTAAACACAGTGAA 

AATGTGAAAACAGAAAGAAACCTAAAACTTGCTTTAGAGAGAACACAAGATATTTCTGAGCAA 

GTAAAAATGAGTTCTGATATTTCCGAAATAGAAGATAAGAATGAGTTTCTTACTGAACAACTT 

TCTAAAATGCAAATTAAATTCAATACCTTAAAAGATAAGTTCCGTAAGACAAGAGATACTCTC 

AG AAAAAAGTCATTGG CT TT AG AAACTGT AC AAAACG AC C TAAG C C AAAC AC AG C AG C AAAT A 

AAGGAAATGAAAGAGATGTATCAAAGTGCAGAAGCTAAAGTCAGTAAATCCACTGGAAAGTGG 

AACTGTGTGGAAGAGAGGATATGTCAACTCCAACGTGAAAATCCGTGGCTTGAACAGCAACTA 

GTTGATGTTCATCAGAAAGAGGATCATAAAGAGATAGTAATTAATATCCAAAGAGGCTTTATT 

GAGAGTAGAAAGAAAGACCTCATGCTAGAAGAGAAAAATAGAAAGCTAATGAATGAATATGAT 

CATTTAAAAGAAAGTCTCTTTCAATATGAGAGACAGAAAGCAGAAACAGTAGTAAGTATCAAG 

G AAG AT AAAT ATTTTCAAACTT CT AG AAAG AAAG TT T AA AC ATTTGG TTCTGG AT AC ATGTTG 

AACCTAGTTGAATATAAAAATCAGTAGGATAAAAAGTGTG 



ORF Start: ATG at 25 



ORF Stop: TAA at 2494 



NOV64a, 

CG138033-01 Protein 
Sequence 



SEQ ID NO: 238 



823 aa 



MWat 94994.5kD 



MKLFGFRSRRGQTVLGSIDHLYTGSGYRIRYSELQKIHKAAVKGDAAEMERCLARRSGDLDAL 
DKQHRTALHLACASGHVKWTLLVNRKCQIDIYDKENRTPLIQAVHCQEEACAVILLEHGANP 
NLKDIYGNTALHYAVYSESTSLAEKLLFHGENIEALDKDSNTPLLFAIICKKEKMVEFLLKNK 
ASTHAVDRLRRTALMLAVHYDSLGIVNILLKQSINVFTQDMCGRDAEDYAISCRLTKIQQOIL 
EHKKMILKNDKIDVGSSDESAVSIFHELCVDSLSALDDELLSVAAKQCVPEKVSEPLRGPSHG 
KGNRIVNGKGEGPPAKHPSLKPSTEMEDPAVKGAVQKRMYMNLSTEQALPVASEEEQQRRERS 
EKKQPQVYEGNNTYKSEKIQLSENICHSTFSAAADRLTQQRKIGKTYPQQFPKKLKEEHDRCT 
LKQE^EEKTNVNMLHKKNREELERKEKQYKKEVEAKQLEPTVQSLEMKPKTARNTPNQDFHNH 
EEVKDLMDENCILKTDIAILRQEICTMKNDNLEKENKYLKDIKIAKETNAALEKCIKLNEEMI 
TKTAFWYQQELNDLKAENTRLNSELLKEKESKKKLEAEIESYQSRLAAAISKHSENVKTERNL 
KLALERTQDISEQVKMSSDISEIEDKNEFLTEQLSKMQIKFNTLKDKFRKTRDTLRKKSLALE 
TVQNDLSQTQQQIKEMKEMYQSAEAKVSKSTGKWNCVEERICQLQRENPWLEQQLVDVHQKED 
HKEIVINIQRGFIESRKKDLMLEEKNRKLMNEYDHLKESLFQYERQKAETWSIKEDKYFQTS 
RKKV 



Further analysis of the NOV64a protein yielded the following properties shown in 
Table 64B. 



Table 64B. Protein Sequence Properties NOV64a 



j 



334 



WO 03/023002 ^^>CT/US02/28539 



PSort 
analysis: 


0.7600 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV64a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 64C. 
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Table 64C Geneseq Results for NOV64a 


Geneseq 
Identifier 


Protcin/Organism/Length (Patent #, 
Date] 


NOV64a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


ABG08398 


Novel human diagnostic protein #8389 - 
Homo sapiens, 784 aa. [WO200 175067- 
A2, I1-OCT-200I] 


370..807 
12..449 


378/438 (86%) 
407/438 (92%) 


0.0 


ABG047I3 


Novel human diagnostic protein #4704 - 
Homo sapiens, 1090 aa. 
[WO200175067-A2, ll-OCT-2001] 


440..807 
1..368 


319/368 (86%) 
345/368 (93%) 


0.0 


AAE01039 


Human death domain-containing 
receptor (DDCR) protein from 
HDPDL3 1 clone - Homo sapiens, 28 1 

aa. [WO200129063-A2,26-APR-2001] 1 

- 


181 ..461 
1..281 


236/281 (83%) 
252/281 (88%) 


e-129 


ABG01862 


Novel human diagnostic protein #1853 - 
Homo sapiens, 307 aa. [WO2001 75067- 
A2, ll-OCT-2001] 


111. .385 
55..301 


178/297 (59%) 
192/297 (63%) 


8e-84 


ABG04687 


Novel human diagnostic protein #4678 - 
Homo sapiens, 418 aa. [WO200 175067- 
A2. 1 l-OCT-2001] 


3..164 
130..291 


142/162 (87%) 
148/162 (90%) 


9e-77 



In a BLAST search of public sequence datbases, the NOV64a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 64D. 



Table 64D. Public BLASTP Results for NOV64a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV64a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9H0H6 


Hypothetical 94.0 kDa protein - 
Homo sapiens (Human), 823 aa. 


1..823 
1..823 


717/823 (87%) 
762/823 (92%) 


0.0 



335 



WO 03/023002 



'CT/US02/28539 



CAB99338 


BA255AI 1.3 (novel protein similar 
to \C IA A 1 074^ - Hninn viniprK 

(Human), 1 1 1 7 aa. 


2..796 

j.. 1 on 


279/820 (34%) 

4iUfOiu 1 SO) 


e- 1 03- 


Q9H560 


BA526D8.2 (novel protein similar to 
KIAA 1074) - Homo sapiens 
(Human), 264 aa. 


2..261 
3. .262 


169/260(65%) 
212/260(81%) 


le-94 


Q9H1Q1 


BA145E8.1 (KIAA 1074) - Homo 
sapiens (Human), 1710 aa. 


431.812 
780..1223 


195/444 (43%) 
275/444 (61%) 


8e-89 


Q9UPS8 


KIAA 1074 protein - Homo sapiens 
(Human), 1709aa. 


431..812 
779.. 1222 


195/444 (43%) 
275/444(61%) 


8e-89 



PFam analysis predicts that the NOV64a protein contains the domains shown in the 
Table 64E. 



Tabic 64E. Domain Analysis of NOV64a 


Pfarn Domain 


NOV64a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 

_ .., i 


ank 


66..98 


14/33 (42%) 
28/33 (85%) 


1 ,9e-07 


ank 


99..13I 


12/33 (36%) 
25/33 (76%) 


1 .2e-05 


ank 


132..I64 


11/33 (33%) 
27/33 (82%) 


7.9e-06 


ank 


165.197 


1 1/33 (33%) 
26/33 (79%) 


0.00014 


ank 


198..230 


1 1/33 (33%) 
23/33 (70%) 


0.0032 
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Example 65. 

The NOV65 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 65A. 



jTable 65A. NOV65 Sequence Analysis 




SEQIDNO:239 14994 bp | 


NOV65a, 

CGI 38043-01 DNA 
Sequence 


GCTTCTCGACTGGGACGCGGCGCGAGGAGGGAGCGCGGCGGCCCCGAGTCTCCCTGAGCCATG 


GGCAACGAGGCGAGCTTGGAAGGGGAAGGGCTCCCCGAAGGGCTGGCGGCGGCCGCAGCGGCT 
GGAGGAGGAGCTAGCGGGGCGGGGAGCCCCTCTCACACCGCGATCCCGGCCGGCATGGAGGCG 
GATTTGAGCCAGCTGAGCGAAGAGGAGAGGAGACAGATCGCCGCTGTCATGTCAAGGGCGCAG 
GGGCTGCCCAAGGGAAGCGTCCCCCCGGCCGCTGCGGAGTCGCCTTCCATGCACAGGAAACAA 
GAGTTGGATAGTAGTCATCCTCCAAAGCAATCAGGAAGACCCCCGGACCCTGGGCGTCCAGCT 
CAACCTGGTCTCAGTAAAAGTAGAACTACAGACACTTTCAGGTCAGAGCAGAAATTGCCTGGG 



336 



WO 03/023002 




•CT/US02/28539 



i AGGAGTCCTTCCACTATTAGCTTGAAAGAATCAAAGTCCAGAACTGATTTAAAGGAAGAGCAC 
I AAGTCTAGTATGATGCCTGGCTTCCTCTCAGAGGTTAACGCTTTAAGTGCTGTTTCCTCTGTT 
■ GTAAATAAATTCAACCCTTTTGATTTGATATCAGACTCTGAGGCATCCCAGGAAGAAACCACC 
AAG AAACAAAAAG TGGTT C AG AAGG AG C AAG G AAAAC CTG AAGG AAT C AT AAAACCT CCTT T A 
CAACAACAGCCACCCAAGCCGATTCCTAAGCAGCAAGGACCTGGTAGGGATCCGCTTCAGCAG 
1 GATGGCACTCCCAAATCAATATCTTCTCAACAACCAGAAAAAATTAAATCACAACCTCCAGGT 
J ACAGGAAAGCCAATTCAGGGTCCTACCCAGACTCCTCAGACAGACCATGCAAAATTGCCACTT 
j CAACGAGATGCATCCAGGCCTCAGACTAAACAGGCAGACATAGTAAGGGGAGAATCAGTTAAA 
! CCCTCACTGCCAAGCCCATCCAAACCACCTATTCAGCAACCAACTCCTGGAAAACCTCCAGCA 
j C AGCAG CCTGG AC ATG AAAAAT C AC AGCCTGGG C CTG CAAAG CCC C C AG CT C AG CC C T C AGGG 

j CTAACAAAGCCATTGGCTCAACAACCAGGGACAGTGAAACCCCCAGTCCAGCCACCAGGGACA 
ACAAAGCCTCCAGCTCAGCCTCTTGGTCCTGCTAAGCCTCCAGCTCAGCAGACTGGGTCAGAG 
AAGCCTTCATCGGAGCAGCCTGGGCCAAAGGCTTTAGCTCAGCCTCCTGGAGTTGGAAAGACT 
j CCAGCTCAACAGCCAGGGCCAGCAAAGCCTCCAACCCAGCAGGTGGGGACACCAAAACCCCTA 
I GCTCAACAACCTGGGCTACAGTCTCCAGCTAAGGCACCTGGGCCTACAAAGACTCCAGCTCAG 
! ACAAAG CCCC C ATCT C AAC AG CC TGG CTCAACAAAACCCCCACCTCAAC AG CCTGG C C C AG C A 

AAGCCCTCACCTCAACAGCCTGGCTCAAC AAAAC CCCCATCTCAACAGCCTGGCTCAGCAAAA 
CCCTCAGCTCAACAGCCTAGCCCAGCAAAGCCCTCGGCTCAGCAATTTACAAAACCAGTAAGC 
CAAACAGGATTTGGAAAACCTCTGCAGCCACCAACAGTGTCTCCATCTGCAAAACAGCCTCCT 
TC ACAAGG CC T CC CT AAAACC ATCTGTC C TCTTT G C AAT AC C ACTG AACTTCTG TTG C ATG T T 
CCAGAAAAGGCCAATTTTAACACATGCACTGAGTGTCAAACCACTGTCTGTAGTCTCTGTGGT 
TTTAATCCCAATCCTCATTTAACGGAGGCAAAAGAGTGGCTCTGTTTGAACTGTCAAATGAAA 
j. AGAGCTCTAGGCGGGGATCTGGCTCCAGTTCCGTCATCACCCCAGCCCAAACTGAAGACTGCA 
j C CTGTT ACCACT AC AT CAG C AGT GAG C AAATCAT CCC C AC AG C C AC AG C AG AC TTC C C CAAAG 

! AAGGATGCTGCACCAAAACAGGATCTCTCCAAGGCACCTGAGCCTAAAAAGCCACCACCGCTA 
j GTGAAACAACCAACCCTTCATGGCTCTCCTTCAGCCAAGGCCAAGCAGCCCCCTGAGGCAGAT 
TCTTTGTCCAAGCCAGCCCCTCCCAAAGAACCTTCTGTCCCATCTGAGCAGGACAAGGCCCCT 
| GTTGCTGATGATAAACCAAAGCAGCCCAAGATGGTAAAGCCAACCACTGACCTTGTATCTTCA 
j TCATCAGCAACAACAAAACCTGATATTCCAAGCTCCAAAGTACAGTCACAAGCTGAAGAGAAA 
j ACAACCCCTCCTCTAAAAACAGACTCTGCCAAACCCTCACAGAGTTTTCCACCAACAGGGGAA 
\ AAAGTCACCCCATTTGATTCTAAAGCCATACCTCGACCTGCATCAGATTCAAAAATTATTTCA 
CATCCTGGTCCCAGTTCAGAGAGCAAAGGTCAAAAACAAGTTGACCCCGTACAAAAGAAGGAA 
GAACCCAAGAAAGCACAAACCAAAATGAGTCCTAAACCAGATGCCAAGCCAATGCCAAAAGGG 
;TCACCAACACCCCCTGGCCCACGACCTACCGCTGGCCAAACTGTCCCCACACCTCAACAGTCC 
CCAAAGCCTCAGGAGCAGTCAAGGCGTTTCAGTCTGAATCTGGGAAGTATTACTGATGCCCCC 
AAATCACAGCCTACAACTCCTCAAGAGACCGTGACTGGGAAACTCTTTGGGTTTGGAGCATCA 
| ATCTTCAGCCAGGCATCAAATTTAATTTCCACTGCAGGCCAACCTGGACCTCATTCACAAAGT 
| GGACCAGGGGCCCCAATGAAACAAGCCCCTGCCCCTTCACAGCCACCTACTTCACAAGGGCCA 
| CCCAAATCCACAGGTCAAGCACCACCAGCACCTGCAAAAAGTATACCTGTGAAAAAGGAAACA 
i AAAGCCCCAGCAGCTGAAAAATTAGAGCCCAAAGCTGAACAAGCTCCAACAGTAAAAAGAACA 
j GAAACAGAAAAAAAGCCACCACCTATTAAGGATAGCAAATCTTTAACAGCTGAGCCTCAAAAG 
j GCTGTCCTTCCCACAAAACTGGAGAAATCGCCCAAACCAGAATCAACCTGTCCTCTCTGCAAA 
ACTGAACTCAAC ATAGGTTCTAAGGATCCTCCTAACTTCAATACTTGC ACTG AATGCAAG AAT 
CAAGTGTGTAATCTCTGTGGATTTAACCCTACACCACATTTGACTGAGATTCAAGAATGGCTT 
i TGTTTAAATTGC C AAACCC AG AG AG C AAT AT C AGG AC AG CTTGG AG AC AT ACG C AAAATGC CA 
CCTGCACCATCAGGACCCAAAGCATCTCCTATGCCTGTTCCTACAGAATCATCATCTCAGAAA 
ACAGCAGTGCCTCCCCAAGTAAAATTAGTGAAAAAGCAAGAACAAGAAGTAAAAACGGAAGCT 
GAAAAAGTCATTCTGGAAAAAGTAAAGGAAACACTATCAATGGAAAAAATTCCTCCTATGGTA 
AC C AC AG ATC AAAAAC AAG AAG AG AGT AAAC T AG AG AAAG AC AAAG CTTC AG CT CTTC AAG AA 
AAAAAGCCACTCCCTGAAGAAAAAAAACTAATCCCTGAAGAAGAAAAGATACGTTCTGAAGAA 
AAAAAGCCACTCCTAGAAGAAAAAAAGCCAACCCCTGAAGACAAAAAGCTACTCCCAGAGGCA 
AAAACATCAGCCCCAGAAGAACAGAAACATGACTTACTTAAATCTCAAGTACAAATTGCTGAA 
; GAAAAGCTTGAAGGCAGAGTGGCTCCAAAGACAGTGCAAGAAGGGAAACAACCACAGACCAAG 
| ATGGAAGGTTTACCATCTGGCACACCTCAGAGTTTACCTAAAGAAGATGATAAGACAACCAAA 
i ACAATAAAAGAACAGCCACAGCCACCATGCACAGCAAAACCTGATCAGGAAAAGGAAGATGAC 
j AAATCAGACACCTCAAGTTCTCAGCAGCCTAAAAGCCCCCAAGGTCTGAGCGACACGGGATAT 
| TCTTCCGATGGAATATCAAGCTCACTTGGTGAAATTCCAAGTCTTATTCCAACTGATGAAAAG 
j GATATTCTCAAGGGACTCAAAAAGGACTCTTTTTCACAAGAAAGCAGCCCTTCCAGCCCCTCA 
GATTTGGCTAAGTTAGAAAGTACAGTCCTATCTATTTTGGAAGCTCAAGCAAGTACACTTGCT 
! G ATG AAAAGTCAGAAAAG AAAAC AC AACCCC ATG AAGTTTCTCCTGAACAGCCTAAAGACCAA 

GAGAAAACTCAGAGTTTATCTGAAACCTTGGAAATTACTATTTCAGAAGAGGAGATCAAAGAG 
AG TC AAG AAG AAAGG AAAGAC AC TTTT AAAAAAG AT AG C C AAC AAG AT ATT C C TT CC AG C AAG 
GACCATAAAGAGAAGTCTGAGTTTGTTGATGACATAACTACTAGAAGAGAGCCTTATGATTCA 
GTTGAAGAGAGTAGTGAAAGTGAAAACTCACCTGTTCCACAAAGAAAACGAAGAACTAGTGTT 
GGCTCATCAAGCAGTGATGAGTATAAACAGGAAGACAGCCAAGGATCAGGGGAAGAGGAGGAC 
TTC ATTCG AAAAC AAATCAT AG AAATGAGTG CTG ATG AAG ATGCTTC AGG TT CTG AAG ATG AT 
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GAGTTCATCAGAAACCAGCTCAAAGAGATTAGTAGCAGTACTGAGAGCCAGAAGAAGGAAGAA 
ACAAAGGGAAAAGGCAAAATAACAGCAGGGAAACACAGACGACTGACTCGAAAAAGTAGCACA 
AGCATTGATGAAGATGCAGGAAGACGTCACTCATGGCATGATGAAGACGATGAAGCATTTGAT 
G AAAGTCCTG AACTT AAAT AC AG AG AAACT AAAAGT C AGG AAAGTG AAG AACT TG T AG T T A CT 
GGAGGAGGAGGGCTACGCCGATTTAAAACAATTGAGCTCAACAGTACAATAGCAGATAAATAT 
TCTGCAGAGTCATCACAGAAAAAAACAAGTTTGTATTTTGACGAAGAGCCAGAATTGGAAATG 
GAAAGCCTGACAGACTCACCTGAAGATAGGTCAAGGGGAGAGGGATCTTCGAGTCTGCATGCT 
TCCAGCTTCACTCCTGGTACATCCCCTACATCAGTATCATCACTTGATGAGGACAGTGACAGT 
AGCCCGAGTCACAAAAAAGGAGAGAGCAAACAGCAACGCAAAGCTCGGCACAGACCACATGGC 
CCTCTTTTGCCTACTATTGAAGATTCTTCAGAGGAAGAAGAATTGAGAGAGGAAGAAGAATTA 
TTAAAGGAGCAAGAAAAGCAGAGGGAAATAGAACAGCAACAAAGAAAGAGTTCTAGTAAAAAA 
TCAAAGAAAGACAAAGATGAACTTCGAGCTCAGAGAAGAAGGGAAAGGCCAAAGACACCACCT 
AGTAATCTCTCTCCCATTGAAGATGCATCTCCGACAGAAGAGTTACGTCAGGCTGCAGAAATG 
GAGGAGCTCCATAGATCTTCTTGTTCTGAATATTCACCTAGCATAGAGTCAGACCCAGAAGGT 
TTTGAAATAAGCCCGGAAAAAATAATAGAAGTACAAAAAGTTTATAAATTGCCCACAGCTGTT 
TCATTATACTCACCAACAGATGAGCAATCTATTATGCAGAAAGAAGGTAGCCAAAAGGCGTTA 
AAAAGTGCTGAGGAGATGTATGAAGAAATGATGCATAAAACACACAAATACAAAGCTTTTCCA 
GCTGCAAATGAACGAGATGAAGTGTTTGAAAAAGAGCCTTTGTATGGTGGGATGCTAATAGAG 
GATTATATTTATGAATCTTTAGTAGAAGACACGTACAATGGATCGGTAGATGGCAGTCTGCTA 
ACAAGGCAAGAAGAAGAAAATGGATTTATGCAGCAGAAAGGAAGAGAGCAAAAGATAAGACTT 
TCAGAACAGATTTATGAAGATCCTATGCAGAAAATTACAGACCTCCAGAAAGAGTTTTATGAG 
TT AG AAAG CTTAC ATTCTGTTGT G CCTCAGG AAG AT ATTGTTTC AAG C TCTTT TAT C ATC C C A 
G AAAG CC ATG AG AT AGTGG AC CT GGGT AC TATGGT AACTT CT AC AG AAG AAG AAAGG AAAC T A 
CTAGATGCTGATGCTGCCTATGAAGAACTTATGAAGAGGCAACAGATGCAATTAACACCTGGA 
TCTAGCCCAACCCAGGCCCCCATTGGTGAGGATATGACAGAGTCCACCATGGACTTTGACAGA 
ATGCCAGATGCCTCTTTGACATCAAGTGTTCTCTCAGGAGCGTCTCTTACAGATTCGACCAGC 
AGTGCAACACTCTCTATCCCAGATGTTAAAATAACCCAACATTTTTCAACAGAAGAAAl'TGAG 
GATGAATATGTAACCGATCATACAAGAGAAATTCAAGAGATAATTGCCCATGAATCGCTGATT 
TTGACCTACTCGGAGCCTTCAGAAAGTGCTACATCTGTCCCACCCTCTGACACACCTTCTCTC 
ACATCATCTGTTTCTTCGGTCTGTACCACAGATAGCTCTTCACCCATTACTACCCTGGATAGC 
ATAACCACAGTTTATACAGAGCCAGTGGACATGATAACTAAATTTGAAGATTCTGAGGAAATT 
TCTTC AT CAACTT ATTTT C C AGG C AG CAT TAT AG A CT AT C CAG AAG AAAT AAG TG CATCTTTA 
GATCGGACTGCCCCACCAGATGGTAGAGCTAGTGCTGATCATATTGTTATTTCCTTATCTGAT 
ATGGCATCTTCTATCATAGAATCTGTAGTACCTAAACCTGAAGGGCCAGTTGCTGACACTGTT 
TCTACTGACTTACTTATATCTGAAAAGGACCCAGTGAAGAAAGCCAAGAAGGAAACTGGGAAT 
GGAATCATTCTGGAAGTTTTGGAAGCTTACAGAGATAAAAAGGAGTTGGAGGCCGAACGAACA 
AAAAGT AGCTTATCCGAAACCGT GTTTG ATC ACC C AC CTTCTTCTGTAAT AG CC C TTC C AATG 
AAAGAGCAGCTTTCAACTACATACTTTACATCTGGAGAGACCTTTGGTCAGGAAAAACCTGCA 
TCTCAGTTACCATCTGGCAGTCCTTCTGTTTCCTCTCTTCCAGCTAAACCTCGCCCATTCTTT 
jAGAAGTTCTTCTTTGGATATATCAGCTCAACCTCCTCCCCCTCCTCCCCCTCCCCCTCCTCCT 
jCCTCCTCCACCACCACCCCCTCCTCCCCCACCACTTCCTCCACCAACTTCACCTAAACCAACT 
)ATTCTTCCTAAAAAAAAGTTAACAGTTGCATCTCCAGTGACTACAGCTACACCTCTGTTTGAT 
J GCTGTTA CT ACTC TAG AG AC C AC AGCTGT TCTG AG AAGT AATGG ATT AC CTG T T A C AAG AAT A 
TGTACTACTGCACCTCCTCCTGTTCCTCCTAAGCCATCTTCAATTCCATCTGGACTTGTATTT 
ACCCACAGGCCTGAGCCAAGCAAACCTCCAATCGCCCCCAAACCAGTGATTCCTCAGCTTCCA 
AC AACT AC AC AAAAACC AAC AG AT AT ACACCCCAAAC C AAC AGG CCT AT CTTT AA CTTC AA G T 
ATGACCTTAAATTCAGTGACTTCAGCAGATTATAAATTGCCTTCCCCTACCTCCCCACTTTCC 
CCACACTCCAACAAGTCTTCACCAAGATTTTCCAAATCCCTCACAGAAACTTATGTAGTTATT 
ACATTGCCATCTGAACCAGGGACTCCAACAGATTCTTCTGCTAGTCAAGCAATTACCAGTTGG 
CCCTTGGGATCACCCTCCAAAGATCTGGTTTCTGTTGAACCTGTGTTTTCTGTAGTTCCTCCT 
GTGACAGCTGTAGAAATTCCAATTTCTTCAGAACAGACCTTCTACATCTCTGGAGCTTTACAG 
ACATTTTCTGCTACCCCTGTCACAGCACCCTCTTCATTTCAAGCAGCTCCCACATCAGTTACA 
CAGTTTCTCACTACTGAAGTTTCCAAGACTGAGGTTTCAGCAACCAGAAGTACAGCTCCTAGT 
GTTGGTCTCAGCAGCATTTCCATAACAATTCCTCCAGAGCCTCTTGCTCTAGATAACATACAT 
T T AG AG AAG CCTC AG TAT AAAG AAG ATGGAAAATTG C AACTTGTTGG TG ATG T AA TTG AT TTG 
CGTACAGTACCAAAGGTAGAAGTTAAAACAACTGATAAATGTATTGATCTTTCTGCTTCTACA 
ATGGATGTGAAAAGGCAGATCACAGCAAATGAAGTTTATGGGAAACAAATTAGTGCTGTCCAA 
CCCTCTATTATAAATCTTAGTGTGACATCATCAATAGTGACTCCTGTATCTCTGGCCACTGAG 
ACAGTGACCTTTGTCACATGCACAGCTAGTGCAAGTTACACTACAGGCACAGAAAGCCTAGTG 
GGTGCAGAACATGCAATGACAACACCACTCCAACTTACAACATCAAAGCATGCTGAGCCCCCA 
TACAGGATACCAAGTGACCAGGTCTTTCCTATAGCTAGGGAAGAAGCACCAATAAACTTATCT 
CT AGGT A CTC C AG C AC ATGC AGT G AC ATTGG CT ATT AC AAAACCTG TC ACTGTG C CTCC TG TT 
GGTGTCACAAATGGATGGACTGATAGCACCGTATCCCAGGGAATCACTGATGGGGAAGTAGTG 
GATCTCAGTACAACCAAGTCTCACAGAACAGTCGTAACAATGGATGAGTCTACTTCAAGTGTG 
ATGACCAAAATAATAGAAGATGAAAAACCCGTTGATTTAACCGCAGGGAGAAGAGCTGTGTGC 
TGTGATGTGGTTTATAAATTACCATTTGGAAGGAGCTGCACAGCACAGCAGCCTGCAACTACT 
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CTTCCTGAGGATCGTTTTGGTTATAGGGATGACCACTATCAGTATGATCGATCAGGGCCATAT 
GGTTATAGAGGGATTGGGGGAATGAAGCCTTCCATGTCTGACACAAATTTAGCAGAAGCTGGA 
CATTTTTTCTATAAAAGTAAGAATGCTTTTGATTATTCTGAAGGAACTGACACAGCAGTAGAT 
CTGACTTCAGGGAGAGTTACTACAGGTGAGGTAATGGATTATTCAAGCAAGACTACAGGTCCA 
j T ATC C AG AAAC ACG ACAAGTC ATTTC AGG AG CTGGG ATT AGT ACCC C AC AGT ATT C C A C AG C A 

j AGAATGACACCACCACCAGGACCCCAGTATTGTGTGGGGAGTGTTTTGAGGTCATCTAATGGT 
| GTTGTCT ATTCTT C AGT AGC AACT CC AAC AC C CT CT AC ATTTG CTAT C ACC AC AC AAC CTG G C 

TCCATTTTCAGCACCACAGTGAGGGATTTGTCTGGTATTCATACGGCTGATGCAGTGACTTCA 
TTACCTGCCATGCACCATAGCCAGCCAATGCCTAGATCATATTTTATAACAACAGGTGCATCT 
G AAACGG AC ATTG CAGTAACTGGT AT TG AT AT CAGTG CCAGTTTGC AAACT ATT AC T ATGG AG 
TCTCTTACTGCTGAGACGATAGACTCTGTTCCCACTTTAACCACAGCATCCGAAGTGTTTCCT 
GAAGTGGTGGGAGATGAAAGTGCTCTTTTAATTGTCCCTGAAGAAGATAAACAACAGCAGCAG 
CTAGACTTGGAGCGTGAGCTCCTGGAACTGGAGAAAATTAAGCAACAGCGCTTTGCTGAGGAA 
TTGGAGTGGGAACGTCAGGAAATTCAAAGGTTCCGAGAACAAGAAAAGATCATGGTTCAGAAA 
AAGTTGGAGGAGCTGCAGTCTATGAAGCAACACCTTCTCTTTCAGCAAGAAGAAGAGCGGCAA 
GCCCAGTTCATGATGAGGCAGGAGACGTTAGCTCAGCAACAGTTACAGCTTGAGCAGATCCAA 
CAGCTGCAACAACAGCTTCACCAGCAGCTGGAGGAGCAAAAGATTCGGCAGATCTACCAGTAT 
AACTATGACCCTTCTGGAACTGCTTCTCCACAAACCACTACAGAGCAGGCAATTTTGGAAGGT 
C AGTATG CTG CTC TGG AAGG C AG CC AATT TTGGG C AACTGAAG ATGC AACC AC C AC AG CTT C A 
GCTGTTGTGGCAATTGAAATACCACAAAGCCAAGGATGGTACACCGTTCAGTCTGATGGTGTT 
ACTCAGTACATTGCCCCACCTGGTATCCTGAGCACTGTTTCAGAAATACCTCTAACAGATGTT 
GTTGTGAAAGAGGAAAAACAACCCAAAAAGAGAAGTTCTGGAGCTAAAGTCCGAGGACAGTAT 
GATGACATGGGAGAAAATATGACAGATGATCCCCGAAGTTTTAAAAAGATAGTGGACAGTGGT 
GTACAAACGGATGACGAAGATGCCACAGATCGGAGCTATGTGAGTAGGAGAAGGAGAACTAAA 
AAGAGTGTGGATACAAGCGTCCAAACTGATGATGAAGATCAGGATGAGTGGGATATGCCTACT 
AGATCAAGGAGGAAAGCTCGTGTAGGGAAATATGGTGACAGCATGACAGAGGCTGACAAGACC 
AAACCCCTTTCCAAAGTCTCCAGCATAGCAGTTCAAACGGTAGCAGAGATATCTGTGCAAACT 
GAACCAGTTGGAACCATAAGAACACCCTCCATACGGGCACGAGTGGATGCCAAGGTAGAAATA 
iATTAAACACATTTCAGCACCTGAAAAGACTTACAAAGGGGGCAGTTTAGGATGTCAAACAGAA 
GCAGATTCAGACACACAAAGTCCTCAATATCTGAGTGCCACATCTCCACCCAAAGACAAGAAA 
CGCCCAACACCTTTAGAGATTGGTTATTCATCTCACCTCCGGGCAGATTCCACAGTACAGCTG 
GCTCCTTCCCCACCCAAATCCCCCAAAGTCCTTTACTCACCCATCTCACCACTTTCACCAGGC 
AAAGCCTTAGAATCAGCCTTTGTACCTTATGAAAAACCCCTCCCTGATGATATAAGTCCACAG 
AAAGT ACTG CAT C C AG AT ATGG C T AAAG T TCCCC C AG CAAGT CCT AAG AC AG C C AAG ATG ATG 
CAGCGTTCTATGTCTGACCCCAAGCCTCTGAGTCCAACAGCAGACGAAAGTTCCAGGGCTCCT 
TTTCAGTATACCGAGGGCTATACGACTAAAGGTTCTCAAACCATGACATCCTCTGGAGCCCAG 
AAAAAAGTTAAAAGAACTCTGCCAAATCCACCTCCTGAGGAGATTTCCACAGGAACTCAATCC 
ACATTCAGCACAATGGGCACAGTTTCCAGGAGAAGGATCTGCAGAACCAACACAATGGCACGA 
GC CAAG ATTCTCC AGG AC AT AG ACAG AG AGCTTG AT C TTGTGG AAAGGGAGT CTG CAAAAC TT 
CGAAAGAAACAAGCAGAGCTTGATGAAGAAGAAAAGGAGATTGATGCTAAGCTACGATACCTG 
GAAATGGGAATTAACAGGAGGAAAGAGGCCCTATTAAAGGAGAGAGAAAAGAGAGAACGAGCC 
j TACCTCCAGGGAGTAGCTGAGGATCGTGATTACATGTCTGACAGTGAAGTGAGTAGCACAAGA 
j CCAACCCGAATAGAAAGTCAGCATGGCATTGAGCGACCAAGAACTGCTCCCCAAACTGAATTC 
| AGCCAGTTTATACCACCACAAACCCAAACAGAATCTCAACTAGTTCCTCCGACAAGTCCTTAC 
\ ACACAATACCAGTACTCTTCCCCTGCTCTTCCTACCCAAGCACCCACCTCATACACTCAACAG 
TCTCATTTTGAGCAACAAACTTTGTACCATCAGCAAGTTTCACCTTATCAGACTCAGCCAACA 
i TTCCAAGCTGTGGCAACAATGTCCTTCACACCTCAAGTTCAACCTACACCAACCCCACAGCCT 
j TCTTATCAGTTACCTTCACAGATGATGGTGATACAACAGAAGCCACGGCAAACTACATTATAT 
{ TTGGAGCCCAAGATAACCTCAAACTATGAAGTGATTCGCAACCAACCCCTTATGATAGCACCT 
GTTTCTACGG AT AAC AC ATTTGCTGTTTCC C ATC TTGGT AGT AAG T AC AAT AG TT T AG A CTTG 
| AGAATAGGTTTGGAGGAAAGAAGTAGCATGGCAAGCAGTCCAATATCAAGCATATCTGCAGAT 
j TCTTTCTATGCAGATATTGATCACCATACTCCACGAAATTATGTCCTAATTGACGACATTGGA 
| GAG AT C AC C AAAGG AAC AG CGG C ATT AAG C ACCG CAT TT AG CCTT C ATG AAAAG G AT CTG TC A 

! AAAACAGACCGTCTCCTTCGAACCACTGAGACACGCCGGTCTCAAGAAGTGACAGATTTCCTA 
j GCACCTTTACAGTCTTCCTCTAGATTGCATAGTTATGTGAAGGCGGAGGAAGACCCAATGGAG 
| G ATCC TT ACG AGTTAAAG CTTCTG AAAC AT C AG ATT AAA C AGG AATTT CGT AG AGGG AC AG AG 

j AGCTTAGATCACCTTGCTGGTCTTTCTCATTATTACCATGCTGATACTAGCTACAGACATTTT 
i CCAAAATCTGAGAAGTATAGCATCAGTAGACTCACACTTGAAAAACAAGCAGCAAAACAACTG 
CCAGCAGCCATACTTTATCAAAAGCAGTCAAAGCATAAGAAATCACTAATTGACCCTAAAATG 
TCAAAATTTTCACCTATTCAAGAAAGTAGAGACCTTGAACCTGATTATTCAAGCTATATGACT 
TCTAGCACTTCATCTATTGGTGGCATTTCCTCCAGGGCAAGGCTCCTTCAAGATGACATCACT 
TTTGGCCTCAGAAAAAATATTACAGACCAACAAAAATTTATGGGATCTTCTCTTGGCACAGGA 
CTGGGCACATTAGGAAATACCATACGCTCAGCTCTGCAGGATGAAGCGGATAAGCCATACAGT 
AGTGGCAGCAGGTCCAGACCTTCCTCCAGACCTTCCTCTGTCTATGGGCTTGATTTATCAATT 
AAAAGGGATTCTTCTAGCTCTTCCCTAAGACTGAAAGCTCAAGAGGCTGAAGCTCTAGATGTT 
TCCTTTAGTCATGCATCATCCTCTGCCAGAACTAAGCCGACCAGTTTGCCAATTAGTCAAAGT 
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AGAGGAAGAATACCAATTGTGGCCCAGAATTCTGAAGAAGAAAGCCCACTCAGTCCTGTTGGC 
CAGCCAATGGGAATGGCCAGGGCTGCAGCTGGACCCCTGCCACCAATATCTGCAGACACCAGG 
GATCAGTTTGGATCAAGCCACTCATTGCCTGAAGTTCAGCAACACATGAGGGAAGAATCACGG 
ACTCGAGGCTATGACCGTGACATAGCATTCATCATGGATGACTTCCAACATGCCATGTCAGAC 
AGTGAAGCCTATCATCTGCGTCGTGAGGAAACAGATTGGTTTGATAAACCCAGGGAGTCTCGT 
TTGGAAAATGGACATGGTCTGGACCGAAAACTGCCGGAAAGATTGGTCCACTCTAGACCACTC 
AGTCAACATCAAGAGCAAATTATACAGATGAACGGGAAAACTATGCACTACATCTTTCCTCAC 
GCAAGGATAAAAATAACAAGAGACTCAAAGGATCACACAGTTTCAGGTAATGGATTAGGAATT 
AGAATTGTGGGTGGTAAAGAAATCCCGGGACATAGTGGAGAAATTGGAGCCTATATTGCCAAG 
ATTCTTCCTGGGGGAAGTGCGGAACAGACGGGGAAGCTTATGGAAGGGATGCAAGTATTGGAA 
TGG AATGG AATT C CCTTG AC TTCT AAAAC AT ATG AAG AAGTT C AG AGT AT CAT TAG T C AG C AA 
AGTGGGGAAGCAGAAATATGTGTAAGACTGGACCTCAATATGCTATCAGATTCTGAAAATTCC 
CAGCATCTGGAACTTCATGAGCCACCAAAAGCTGTGGATAAGGCGAAATCCCCAGGGGTTGAT 
C CT AAGC AGTTGG C AGC AG AACT CC AG AAGGTTT C AC TAC AG C AG TC AC CG CTG G TTCTGTCA 
TCAGTTGTTGAAAAAGGATCTCATGTTCATTCAGGTCCTACATCAGCAGGATCCAGTTCCGTT 
CCCAGCCCTGGGCAACCAGGGTCCCCCTCAGTGAGCAAAAAGAAGCACGGCAGCAGCAAGCCT 
ACCGATGGAACAAAGGTTGTCTCTCATCCAATTACAGGAGAAATTCAGCTTCAAATTAACTAT 
GATCTTGGAAATCTCATAATACATATTCTCCAAGCAAGAAATCTTGTTCCTCGAGACAACAAT 
GGTTATTCTGACCCTTTTGTGAAAGTGTACCTTCTTCCAGGGAGAGGTCAAGTCATGGTTGTC 
C AG AATG C AAGTG CTG AGT AC AAG AG AAGG ACTAAAC ATGTC C AG AAAAG TCT T AAT C C TG AG 
TGGAATCAAACAGTAATTTATAAAAGTATTTCCATGGAACAGCTCAAGAAGAAAACACTGGAC 
GTGACAGTTTGGGATTATGATAGATTTTCATCCAACGACTTCCTTGGGGAGGTATTGATTGAT 
TTATCTAGCACATCTCACCTCGATAACACTCCAAGGTGGTATCCTCTCAAAGAACAGACTGAA 
AGCATTGATCATGGCAAGTCTCATTCCAGTCAGAGCAGCCAGCAGTCCCCAAAGCCATCTGTT 
ATCAAAAGCAGAAGCCATGGTATCTTCCCTGACCCATCAAAGGACATGCAGGTTCCCACCATT 
GAG AAATCC C AT AGT AGTCCTGG T AG CTC AAAAT C AT CATC AGAAGG CC AT CT CCG TT C TC AT 
GGACCATCTCGCAGTCAAAGCAAAACCAGCGTCACTCAGACCCACCTGGAAGATGCAGGGGCT 
GCCATAGCTGCTGCCGAAGCTGCCGTGCAACAACTCCGCATTCAACCAAGTAAAAGACGCAAA 
TAAATTCCTCAGCATGGCAGCTTAATGTTCATCTGTTGCCTTTCTTTCCTGCTGTCCTTTCCT 


GTTTGCTTTCAGTTTTCAACATCCTCTGCTCACCCTGTTCTCTGTCCCTTTGTCTGTGTAAGA 


ACGATATAAATACATTAATATGCTCTTTCTTATTTAGATTTTTTTTATTTACATAGACTGAAA 


TAAAACTGGCTGTTTCTCCTTTGTTTCCACCATCCATCCAACCTGGCTCATAGCATTTGATAC 


AGTGTCTGTGATGTTTGGAAGCAAAGCAATGTTGTGTGTCCTTTTTGTTTGCGCTTAAATATC 




ORF Start: ATG at 61 j 


ORF Stop: TAA at 14680 




SEQIDNO:240 j4873 aa MW at 531766.4kD 


NOV65a, 

CG138043-01 Protein 
Sequence 


MGNE AS L EG E GL P EG L AAAAAAGGG AS G AG S P S HT A I P AGM E ADL SQ L SEEERRQI AA VM S RA 
GGLPKGSVPPAAAESPSMHRKQELDSSHPPKQSGRPPDPGRPAQPGLSKSRTTDTFRSEQKLP 
GRSPSTISLKESKSRTDLKEEHKSSMMPGFLSEVNALSAVSSVWKFNPFDLISDSEASQEET 
TKKQKWQKEQGKPEGIIKPPLQQQPPKPIPKQQGPGRDPLQQDGTPKSISSQQPEKIKSQPP 
GTGKPIQGPTQTPQTDHAKLPLQRDASRPQTKQADIVRGESVKPSLPSPSKPPIQQPTPGKPP 
AQQPGHEKSQPGPAKPPAQPSGLTKPLAQQPGTVKPPVQPPGTTKPPAQPLGPAKPPAQQTGS 
EKPSSEQPGPKALAQPPGVGKTPAQQPGPAKPPTQQVGTPKPLAQQPGLQSPAKAPGPTKTPA 
QTKPPSQQPGSTKPPPQQPGPAKPSPQQPGSTKPPSQQPGSAKPSAQQPSPAKPSAQQFTKPV 
S QTG FG K PLQP P TVS PSA KQPPSQGLPKTIC PLCNTT E L LLH VPE KAN FNTC TEC QTT VC S LC 
GFNPNPHLTEAKEWLCLNCQMKRALGGDLAPVPSSPQPKLKTAPVTTTSAVSKSSPQPQQTSP 
KKDAAPKQDLSKAPEPKKPPPLVKQPTLHGSPSAKAKQPPEADSLSKPAPPKEPSVPSEQDKA 
PVADDKPKQPKMVKPTTDLVSSSSATTKPDIPSSKVQSQAEEKTTPPLKTDSAKPSQSFPPTG 
EKVTPFDSKAIPRPASDSKIISHPGPSSESKGQKQVDPVQKKEEPKKAQTKMSPKPDAKPMPK 
GSPTPPGPRPTAGQTVPTPQQSPKPQEQSRRFSLNLGSITDAPKSQPTTPQETVTGKLFGFGA 
SIFSQASNLISTAGQPGPHSQSGPGAPMKQAPAPSQPPTSQGPPKSTGQAPPAPAKSIPVKKE 
TKAPAAEKLEPKAEQAPTVKRTETEKKPPPIKDSKSLTAEPQKAVLPTKLEKSPKPESTCPLC 
KTELNIGSKDPPNFNTCTECKNQVCNLCGFNPTPHLTEIQEWLCLNCQTQRAISGQLGDIRKM 
PPAPSGPKASPMPVPTESSSQKTAVPPQVKLVKKQEQEVKTEAEKVILEKVKETLSMEKIPPM 
VTTDQKQEESKLEKDKASALQEKKPLPEEKKLIPEEEKIRSEEKKPLLEEKKPTPEDKKLLPE 
AKTSAPEEQKHDLLKSQVQIAEEKLEGRVAPKTVQEGKQPQTKMEGLPSGTPQSLPKEDDKTT 
KTIKEQPQPPCTAKPDQEKEDDKSDTSSSQQPKSPQGLSDTGYSSDGISSSLGEIPSLIPTDE 
KDILKGLKKDS FSQESSPSSPSDLAKLESTVLSILEAQASTLADEKSEKKTQPHEVSPEQPKD 
QEKTQSLSETLEITISEEEIKESQEERKDTFKKDSQQDIPSSKDHKEKSEFVDDITTRREPYD 
SVEESSESENSPVPQRKRRTSVGSSSSDEYKQEDSQGSGEEEDFIRKQIIEMSADEDASGSED 
DEFIRNQLKEISSSTESQKKEETKGKGKITAGKHRRLTRKSSTSIDEDAGRRHSWHDEDDEAF 
DESPELKYRETKSQESEELWTGGGGLRRFKTIELNSTIADKYSAESSQKKTSLYFDEEPELE 
MESLTDSPEDRSRGEGSSSLHASSFTPGTSPTSVSSLDEDSDSSPSHKKGESKQQRKARHRPH 
2PLLPTIEDSSEEEELREEEELLKEQEKQREIEQQQRKSSSKKSKKDKDELRAQRRRERPKTP 
PSNLSPIEDASPTEELRQAAEMEELHRSSCSEYSPSIESDPEGFEISPEKIIEVQKVYKLPTA 
VSLYSPTDEQSIMQKEGSQKALKSAEEMYEEMMHKTHKYKAFPAANERDEVFEKEPLYGGMLI 
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EDYIYESLVEDTYNGSVDGSLLTRQEEENGFMQQKGREQKIRLSEQIYEDPMQKITDLQKEFY 

ELESLHSWPQEDIVSSSFIIPESHEIVDLGTMVTSTEEERKLLDADAAYEELMKRQQMQLTP 

GSSPTQAPIGEDMTESTMDFDRMPDASLTSSVLSGASLTDSTSSATLSIPDVKITQHFSTEEI 

EDEYVTDHTREIQEIIAHESLILTYSEPSESATSVPPSDTPSLTSSVSSVCTTDSSSPITTLD 

SITTVYTEPVDMITKFEDSEEISSSTYFPGSIIDYPEEISASLDRTAPPDGRASADHI VISLS 

DMASSIIESWPKPEGPVADTVSTDLLISEKDPVKKAKKETGNGIILEVLEAYRDKKELEAER 

TKSSLSETVFDHPPSSVIALPMKEQLSTTYFTSGETFGQEKPASQLPSGSPSVSSLPAKPRPF 

FRSSSLDISAQPPPPPPPPPPPPPPPPPPPPPPLPPPTSPKPTILPKKKLTVASPVTTATPLF 

DAVTTLETTAVLRSNGLPVTRICTTAPPPVPPKPSSIPSGLVFTHRPEPSKPPIAPKPVI PQL 

PTTTQKPTDIHPKPTGLSLTSSMTLNSVTSADYKLPSPTSPLSPHSNKSSPRFSKSLTETYW 

ITLPSEPGTPTDSSASQAITSWPLGSPSKDLVSVEPVFSWPPVTAVEIPISSEQTFYISGAL 

QTFSATPVTAPSSFQAAPTSVTQFLTTEVSKTEVSATRSTAPSVGLSSISITIPPEPLALDNI 

HLEKPQYKEDGKLQLVGDVIDLRTVPKVEVKTTDKCIDLSASTMDVKRQITANEVYGKQISAV 

QPSI INLSVTSSIVTPVSLATETVTFVTCTASASYTTGTESLVGAEHAMTTPLQLTTSKHAEP 

PYRIPSDQVFPIAREEAPINLSLGTPAHAVTLAITKPVTVPPVGVTNGWTDSTVSQGITDGEV 

VDLSTTKSHRTWTMDESTSSVMTKIIEDEKPVDLTAGRRAVCCDWYKLPFGRSCTAQQPAT 

TLPEDRFGYRDDHYQYDRSGPYGYRGIGGMKPSMSDTNLAEAGHFFYKSKNAFDYSEGTDTAV 

DLTSGR VTTG E VMD Y SS KTTG P Y PET RQ V I S GAG 1ST PQ YS TARMT PPPGPQYCVGSVLRSSN 

GWYSSVATPTPSTFAITTQPGSIFSTTVRDLSGIHTADAVTSLPAMHHSQPMPRSYFITTGA 

SETDIAVTGIDISASLQTITMESLTAETIDSVPTLTTASEVFPEWGDESALLIVPEEDKQQQ 

QLDLERELLELEKIKQQRFAEELEWERQEIQRFREQEKIMVQKKLEELQSMKQHLLF6QEEER 

QAQFMMRQETLAQQQLQLEQIQQLQQQLHQQLEEQKIRQIYQYNYDPSGTASPQTTTEQAILE 

GQ YAALEGSQFWATEDATTTAS A WAI EI PQSQGWYTVQSDGVTQY I APPG I LSTVSE I PLTD 

VWKEEKQPKKRSSGAKVRGQYDDMGENMTDDPRSFKKIVDSGVQTDDEDATDRSYVSRRRRT 

KKSVDTSVQTDDEDQDEWDMPTRSRRKARVGKYGDSMTEADKTKPLSKVSSIAVQTVAEISVQ 

TE PVGTI RTPS I RARVDAKVE IIKHISAPE KT YKGGSLGCQTEADSDTQS PQY LS AT S P P KD K 

KRPTPLEIGYSSHLRADSTVQLAPSPPKSPKVLYSPISPLSPGKALESAFVPYEKPLPDDISP 

QKVLHPDMAKVPPASPKTAKMMQRSMSDPKPLSPTADESSRAPFQYTEGYTTKGSQTMTSSGA 

QKKVKRTLPNPPPEEISTGTQSTFSTMGTVSRRRICRTNTMARAKILQDIDRELDLVERESAK 

LRKKQAELDEEEKEIDAKLRYLEMGINRRKEALLKEREKRERAYLQGVAEDRDYMSDSEVSST 

RPTRIESQHGIERPRTAPQTEFSQFIPPQTQTESQLVPPTSPYTQYQYSSPALPTQAPTSYTQ 

QSHFEQQTLYHQQVSPYQTQPTFQAVATMSFTPQVQPTPTPQPSYQLPSQMMVIQQKPRQTTL 

YLEPKITSNYEVIRNQPLMIAPVSTDNTFAVSHLGSKYNSLDLRIGLEERSSMASSPISSISA 

DSFYADIDHHTPRNYVLIDDIGEITKGTAALSTAFSLHEKDLSKTDRLLRTTETRRSQEVTDF 

LAPLQSSSRLHSYVKAEEDPMEDPYELKLLKHQIKQEFRRGTESLDHLAGLSHYYHADTSYRH 

FPKSEKYSISRLTLEKQAAKQLPAAILYQKQSKHKKSLIDPKMSKFSPIQESRDLEPDYSSYM 

TSSTSSIGGISSRARLLQDDITFGLRKNITDQQKFMGSSLGTGLGTLGNTIRSALQDEADKPY 

SSGSRSRPSSRPSSVYGLDLSIKRDSSSSSLRLKAQEAEALDVSFSHASSSARTKPTSLPISQ 

SRGRIPIVAQNSEEESPLSPVGQPMGMARAAAGPLPPISADTRDQFGSSHSLPEVQQHMREES 

RTRGYDRDIAFIMDDFQHAMSDSEAYHLRREETDWFDKPRESRLENGHGLDRKLPERLVHSRP 

LSQHQEQIIQMNGKTMHYIFPHARIKITRDSKDHTVSGNGLGIRIVGGKEIPGHSGEIGAYIA 

KILPGGSAEQTGKLMEGMQVLEWNGIPLTSKTYEEVQSIISQQSGEAEICVRLDLNMLSDSEN 

SQHLELHEPPKAVDKAKS PGVDPKQLAAELQKVSLQQSPLVLSSVVEKGSHVHSGPTS AGSSS 

VPSPGQPGSPSVSKKKHGSSKPTDGTKWSHPITGEIQLQINYDLGNLIIHILQARNLVPRDN 

NGYSDPFVKVYLLPGRGQVMWQNASAEYKRRTKHVQKSLNPEWNQTVIYKSISMEQLKKKTL 

EVTVWDYDRFSSNDFLGEVLIDLSSTSHLDNTPRWYPLKEQTESIDHGKSHSSQSSQQSPKPS 

VIKSRSHGIFPDPSKDMQVPTIEKSHSSPGSSKSSSEGHLRSHGPSRSQSKTSVTQTHLEDAG 

AAIAAAEAAVQQLRIQPSKRRK 



Further analysis of the NOV65a protein yielded the following properties shown in 
Table 65B. 



j Table 65B. Protein Sequence Properties NOV65a 



PSort 
analysis: 


0.9800 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 
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A search of the NOV65a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 65C. 



I Table 65C. Geneseq Results for NOV65a 



i ! 

, Geneseq j Protein/Organism/Length [Patent #, 
Identifier j Date] 


NOV65a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM58340 


Human brain expressed single exon 
probe encoded protein SEQ ID NO: 
30445 - Homo sapiens, 748 aa. 
[ WO200 1 57275-A2, 09-AUG-200 1 ] 


I278..2025 
I. .748 


748/748(100%) 
748/748 (100%) 


0.0 


!aAM58328 


Human brain expressed single exon 
probe encoded protein SEQ ID NO: 
30433 - Homo sapiens, 396 aa. 
[ WO200 1 57275-A2, 09-A UG-200 1 ] 


3976..437I 
I..396 


396/396(100%) 
396/396(100%) 


0.0 


AAM58208 


Human brain expressed single exon 
probe encoded protein SEQ ID NO: 
303 1 3 - Homo sapiens, 1 82 aa. 
[ WO200 1 57275-A2, 09-A UG-200 1 ] 


2789..2970 
1.-182 


182/182(100%) 
182/182 (100%) 


e-103 


AAM32174 


Peptide #621 1 encoded by probe for 
measuring placental gene expression - 
Homo sapiens, 162 aa. 
[ WO200 1 57272-A2, 09-A UG-200 1 ] 


1823. .1984 
I..I62 


162/162(100%) 
162/162(100%) 


2e-87 


AAM7 1 894 j Human bone marrow expressed probe 
\ encoded protein SEQ ID NO: 32200 - 
i Homo sapiens, 162 aa. 
] [ WO200 1 57276-A2, 09-AUG-200 1 ] 


1823.. 1984 
1..162 


162/162(100%) 
162/162(100%) 


2e-87 



5 

In a BLAST search of public sequence datbases, the NOV65a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 65 D. 



j Table 65D. Public BLASTP Results for NOV65a 


! ] 

Protein j 

Accession : Protcin/Organism/Length 
Number 

! i 


NOV65a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


t " " ' ' ! 

Q9Y6V0 1 Piccolo protein (Aczonin) 

j (DJ0897G 10.1)- Homo sapiens 
(Human), 5 1 47 aa (fragment). 


37..4872 
1 ..4760 


4753/4836 (98%) 
4756/4836 (98%) 


0.0 



342 



WO 03/023002 




PCT/US02/28539 



Q9JKS6 • Piccolo protein (Multidomain 

| presynaptic cytomatrix protein) - 
| Rattus norvegicus (Rat), 5085 aa. 


1 ..4870 
1 ..4877 


4060/4924 (82%) 
4317/4924 (87%) 


0.0 


Q9QYX7 I Piccolo protein (Presynaptic 
j cytomatrix protein) (Aczonin) 
j (Brain- derived HLMN protein) - 
] Mus musculus (Mouse), 5038 aa. 


1 ..4870 
1..4830 


4047/4922 (82%) 1 0.0 
4293/4922 (86%) 

I 
1 

i 


Q9PU36 


Piccolo protein (Aczonin) - Gallus 
gallus (Chicken), 5120 aa (fragment). 


87..4872 
1..4853 


3260/4986 (65%) 
3724/4986 (74%) 


0.0 


T00332 


hypothetical protein KIAA0559 - 
human, 1 2 1 2 aa (fragment). 


3662..4873 
1 -.1212 


1212/1212(100%) 
1212/1212(100%) 


0.0 



PFam analysis predicts that the NOV65a protein contains the domains shown in the 
Table 65E. 



Table 65E. Domain Analysis of NOV65a 


Pfam Domain 


NOV65a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


PDZ 


4440..4527 


26/90 (29%) 
69/90 (77%) 


8,4e-09 


C2 


4647..4745 


40/104(38%) 
78/104 (75%) 


7.96-27 



5 

Example 66. 

The NOV66 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 66A. 



Table 66A. NOV66 Sequence Analysis 




SEQIDNO:241 j2914 bp | 


NOV66a, 

CGI 3 8208-01 DNA 
Sequence 


CCGGG CC ATGGCGGAACG CGG AGGGGCGGG CGGTGGT CC CGG AGGCG CCGGGGG CG G C AG C GG 
CCAGCGGGGATCCGGGGTCGCCCAGTCCCCTCAGCAGCCGCCGCCGCAGCAGCAGCAGCAGCA 
GCCGCCGCAGCAGCCGACGCCCCCCAAGCTGGCCCAGGCCACCTCGTCGTCCTCGTCCACCTC 
GGCGGCGGCTGCCTCCTCCTCGTCCTCGTCTACCTCCACCTCCATGGCCGTGGCGGTGGCCTC 
GGGCTCCGCGCCTCCCGGTGGCCCGGGGCCAGGCCGCACCCCCGCCCCGGTGCAGATGAACCT 
GTACGCCACCTGGGAGGTGGACCGGAGCTCGTCCAGCTGCGTGCCTAGGCTATTCAGCTTGAC 
CCTGAAGAAACTCGTCATGCTAAAAGAAATGGACAAAGATCTTAACTCAGTGGTCATCGCTGT 
GAAGCTGCAGGGTTCAAAAAGAATTCTTCGCTCCAACGAGATCGTCCTTCCAGCTAGTGGACT 
GGTGGAAACAGAGCTCCAATTAACCTTCTCCCTTCAGTACCCTCATTTCCTTAAGCGAGATGC 
CAACAAGCTGCAGATCATGCTGCAAAGGAGAAAACGTTACAAGAATCGGACCATCTTGGGCTA 
TAAGACCTTGGCCGTGGGACTCATCAACATGGCAGAGGTGATGCAGCATCCTAATGAAGGCGC 
ACTGGTGCTTGGCCTACACAGCAACGTGAAGGATGTCTCTGTGCCTGTGGCAGAAATAAAGAT 
CTACTCCCTGTCCAGCCAACCCATTGACCATGAAGGAATCAAATCCAAGCTTTCTGATCGTTC 
TCCTGATATTGACAATTATTCTGAGGAAGAGGAAGAGAGTTTCTCATCAGAACAGGAAGGCAG 
TGATGATCCATTGCATGGGCAGGACTTGTTCTACGAAGACGAAGATCTCCGGAAAGTGAAGAA 
GACCCGGAGGAAACTAACCTCAACCTCTGCCATCACAAGGCAACCTAACATCAAACAGAAGTT 



343 



WO 03/023002 




PCT/US02/28539 



i' 

i 


TGTGGCCCTCCTGAAGCGGTTTAAAGTTTCAGATGAGGTGGGCTTTGGGCTGGAGCATGTGTC 
CCGCGAGCAGATCCGGGAAGTGGAAGAGGACTTGGATGAATTGTATGACAGTCTGGAGATGTA 
CAACCCCAGCGACAGTGGCCCTGAGATGGAGGAGACAGAAAGCATCCTCAGCACGCCAAAGCC 
CAAGCTCAAGCCTTTCTTTGAGGGGATGTCGCAGTCCAGCTCCCAGACGGAGATTGGCAGCCT 
CAACAGCAAAGGCAGCCTCGGAAAAGACACCACCAGCCCTATGGAATTGGCTGCTCTAGAAAA 
AATTAAATCTACTTGGATTAAAAACCAAGATGACAGCTTGACTGAAACAGACACTCTGGAAAT 
CACTGACCAGGACATGTTTGGAGATGCCAGCACGAGTCTGGTTGTGCCGGAGAAAGTCAAAAC 
TCCCATGAAGTCCAGTAAAACGGATCTCCAGGGCTCTGCCTCCCCCAGCAAAGTGGAGGGGGT 
GCACACACCCCGGCAGAAGAGGAGCACGCCCCTGAAGGAGCGGCAGCTCTCCAAGCCCCTAAG 
TGAGAGGACCAACAGTTCCGACAGCGAGCGCTCCCCAGATCTGGGCCACAGCACGCAGATTCC 
AAGAAAGGTGGTGTATGACCAGCTCAATCAGATCCTGGTGTCAGATGCAGCCCTCCCAGAAAA 
TGTCATTCTGGTGAACACCACTGACTGGCAGGGCCAGTATGTGGCTGAGCTGCTCCAGGACCA 
GCGGAAGCCTGTGGTGTGCACCTGCTCCACCGTGGAGGTCCAGGCCGTGCTGTCCGCCCTGCT 
CACCCGGATCCAGCGCTACTGCAACTGCAACTCTTCCATGCCGAGGCCAGTGAAGGTGGCTGC 
TGTGGGAGGCCAGAGCTACCTGAGCTCCATCCTCAGGTTCTTTGTCAAGTCCCTGGCCAACAA 
GACCTCCGACTGGCTTGGCTACATGCGCTTCCTCATCATCCCCCTCGGTTCTCACCCTGTGGC 
CAAATACTTGGGGTCAGTCGACAGTAAATACAGTAGTTCCTTCCTGGATTCTGGTTGGAGAGA 
T CTGTTC AGT CGC T CGG AGCC ACC AG TG T C AG AG C AACTGG A CGTGG C AGGG CG G G TG ATG C A 
GTACGTCAACGGGGCAGCCACGACACACCAGCTTCCCGTGGCCGAAGCCATGCTGACTTGCCG 
GCATAAGTTCCCTGATGAAGACTCCTATCAGAAGTTTATTCCCTTCATTGGCGTGGTGAAGGT 
GGGTCTGGTTGAAGACTCTCCCTCCACAGCAGGCGATGGGGACGATTCTCCTGTGGTCAGCCT 
TACTGTGCCCTCCACATCACCACCCTCCAGCTCGGGCCTGAGCCGAGACGCCACGGCCACCCC 
TCCCTCCTC C CC AT CTATG AG C AGCG CCC TGGC C ATCGTGGGG AG CC CTAAT AG C C CAT AT G G 
GGACGTGATTGGCCTCCAGGTGGACTACTGGCTGGGCCACCCCGGGGAGCGGAGGAGGGAAGG 
CGACAAGAGGGACGCCAGCTCGAAGAACACCCTCAAGAGTGTCTTCCGCTCAGTGCAGGTGTC 
CCGCCTGCCCCATAGTGGGGAGGCCCAGCTTTCTGGCACCATGGCCATGACTGTGGTCACCAA 
AGAAAAGAACAAGAAAGTTCCCACCATCTTCCTGAGCAAGAAACCCCGAGAAAAGGAGiTGGA 
TTCTAAGAGCCAGGTCATTGAAGGCATCAGCCGCCTCATCTGCTCAGCCAAGCAGCAGCAGAC 
T ATGCTG AG AGTGT CC AT CG ATGGGG TCG AG TGG AG TG ACATCAAGTTCT TC C AG CTG GCAG C 
CCAGTGGCCCACCCATGTCAAGCACTTTCCAGTGGGACTCTTCAGTGGCAGCAAGGCCACCTG 
AGGCCCTGTCTCCCAG 




ORF Start: ATG at 8 


jORF Stop: TGA at 2897 




SEQIDNO:242 .1963 aa |MW at 104897.3 kD 


jNOV66a, 

ICG138208-0I Protein 
jSequence 

i 

I 
! 


MAERGGAGGGPGGAGGGSGQRGSGVAQSPQQPPPQQQQQQPPQQPTPPKLAQATSSSSSTSAA 
AASSSSSSTSTSMAVAVASGSAPPGGPGPGRTPAPVQMNLYATWEVDRSSSSCVPRLFSLTLK 
KLVMLKEMDKDLNSWIAVKLQGSKRILRSNEIVLPASGLVETELQLTFSLQYPHFLKRDANK 
LQIMLQRRKRYKNRTILGYKTLAVGLINMAEVMQHPNEGALVLGLHSNVKDVSVPVAEIKIYS 
LSSQPIDHEGIKSKLSDRSPDIDNYSEEEEESFSSEQEGSDDPLHGQDLFYEDEDLRKVKKTR 
RKLTSTSAITRQPNIKQKFVALLKRFKVSDEVGFGLEHVSREQIREVEEDLDELYDSLEMYNP 
SDSGPEMEETESILSTPKPKLKPFFEGMSQSSSQTEIGSLNSKGSLGKDTTSPMELAALEKIK 
STWIKNQDDSLTETDTLEITDQDMFGDASTSLWPEKVKTPMKSSKTDLQGSASPSKVEGVHT 
PRQKRSTPLKERQLSKPLSERTNSSDSERSPDLGHSTQIPRKWYDQLNQILVSDAALPENVI 
LVNTTDWQGQYVAELLQDQRKPWCTCSTVEVQAVLSALLTRIQRYCNCNSSMPRPVKVAAVG 
GQSYLSSILRFFVKSLANKTSDWLGYMRFLI IPLGSHPVAKYLGSVDSKYSSSFLDSGWRDLF 
SRSEPPVSEQLDVAGRVMQYVNGAATTHQLPVAEAMLTCRHKFPDEDSYQKFIPFIGWKVGL 
VEDSPSTAGDGDDSPWSLTVPSTSPPSSSGLSRDATATPPSSPSMSSALAIVGSPNSPYGDV 
IGLQVDYWLGHPGERRREGDKRDASSKNTLKSVFRSVQVSRLPHSGEAQLSGTMAMTWTKEK 
NKKVPTIFLSKKPREKEVDSKSQVIEGISRLICSAKQQQTMLRVSIDGVEWSDIKFFQLAAQW 
PTHVKHFPVGLFSGSKAT 


i 


SEQ ID NO: 243 


3225 bp 


NOV66b, 

<CG 13 8208-02 DNA 
jSequence 

i 

] 

! 
i 

t 


CGGCCTCCGTAACCCCCGCCTAGCCGGGCCATGGCGGAACGCGGAGGGGCGGGCGGTGGTCCC 


GGAGGCGCCGGGGGCGGCAGCGGCCAGCGGGGATCCGGGGTCGCCCAGTCCCCTCAGCAGCCG 
CCGCCGCAGCAGCAGCAGCAGCAGCCGCCGCAGCAGCCGACGCCCCCCAAGCTGGCCCAGGCC 
ACCTCGTCGTCCTCGTCCACCTCGGCGGCGGCTGCCTCCTCCTCGTCCTCGTCTACCTCCACC 
TCCATGGCCGTGGCGGTGGCCTCGGGCTCCGCGCCTCCCGGTGGCCCGGGGCCAGGCCGCACC 
CCCGCCCCGGTGCAGATGAACCTGTACGCCACCTGGGAGGTGGACCGGAGCTCGTCCAGCTGC 
GTGCCTAGGCTATTCAGCTTGACCCTGAAGAAACTCGTCATGCTAAAAGAAATGGACAAAGAT 
CTT AACTC AGTGGTCAT CG CTGTG AAGCTGC AG GGTTC AAAAAG AATTCTTCG CT C C AACG AG 
ATCGT CCTTCC AGCTAGTGG ACTGGTGG AAACAG AG CTC CAATTAAC CTT CTC CCTTC AG T AC 
CCTCATTTCCTTAAGCGAGATGCCAACAAGCTGCAGATCATGCTGCAAAGGAGAAAACGTTAC 
AAG AATCGG ACC ATCTTGGG CT AT AAG ACCTTGG CCGTGGG ACTC AT C AAC ATGG C AG AG G TG 
ATGCAGCATCCTAATGAAGGCGCACTGGTGCTTGGCCTACACAGCAACGTGAAGGATGTCTCT 
GTGCCTGTGGCAGAAATAAAGATCTACTCCCTGTCCAGCCAACCCATTGACCATGAAGGAATC 
AAATCCAAGCTTTCTGATCGTTCTCCTGATATTGACAATTATTCTGAGGAAGAGGAAGAGAGT 
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TTCTCATCAGAACAGGAAGGCAGTGATGATCCATTGCATGGGCAGGACTTGTTCTACGAAGAC 
GAAGATCTCCGGAAAGTGAAGAAGACCCGGAGGAAACTAACCTCAACCTCTGCCATCACAAGG 
CAACCTAACATCAAACAGAAGTTTGTGGCCCTCCTGAAGCGGTTTAAAGTTTCAGATGAGGTG 
GGCTT TGGG CTGG AG CATGTG TCCCG CG AGC AG ATCCGGG AAGTGG AAG AGG AC TTGG ATG AA 
TTGTATGACAGTCTGGAGATGTACAACCCCAGCGACAGTGGCCCTGAGATGGAGGAGACAGAA 
AGCATCCTCAGCACGCCAAAGCCCAAGCTCAAGCCTTTCTTTGAGGGGATGTCGCAGTCCAGC 
TCCCAGACGGAGATTGGCAGCCTCAACAGCAAAGGCAGCCTCGGAAAAGACACCACCAGCCCT 
ATGGAATTGGCTGCTCTAGAAAAAATTAAATCTACTTGGATTAAAAACCAAGATGACAGCTTG 
ACTGAAACAGACACTCTGGAAATCACTGACCAGGACATGTTTGGAGATGCCAGCACGAGTCTG 
GTTGTGCCGGAGAAAGTCAAAACTCCCATGAAGTCCAGTAAAACGGATCTCCAGGGCTCTGCC 
TCCCCCAGCAAAGTGGAGGGGGTGCACACACCCCGGCAGAAGAGGAGCACGCCCCTGAAGGAG 
CGGCAGCTCTCCAAGCCCCTAAGTGAGAGGACCAACAGTTCCGACAGCGAGCGCTCCCCAGAT 
CTGGGCCACAGCACGCAGATTCCAAGAAAGGTGGTGTATGACCAGCTCAATCAGATCCTGGTG 
TCAGATGCAGCCCTCCCAGAAAATGTCATTCTGGTGAACACCACTGACTGGCAGGGCCAGTAT 
GTGGCTGAGCTGCTCCAGGACCAGCGGAAGCCTGTGGTGTGCACCTGCTCCACCGTGGAGGTC 
CAGGCCGTGCTGTCCGCCCTGCTCACCCGGATCCAGCGCTACTGCAACTGCAACTCTTCCATG 
CCGAGGCCAGTGAAGGTGGCTGCTGTGGGAGGCCAGAGCTACCTGAGCTCCATCCTCAGGTTC 
TTTGTCAAGTCCCTGGCCAACAAGACCTCCGACTGGCTTGGCTACATGCGCTTCCTCATCATC 
CCCCTCGGTTCTCACCCTGTGGCCAAATACTTGGGGTCAGTCGACAGTAAATACAGTAGTTCC 
TTCCTGGATTCTGGTTGGAGAGATCTGTTCAGTCGCTCGGAGCCACCAGTGTCAGAGCAACTG 
GACGTGGCAGGGCGGGTGATGCAGTACGTCAACGGGGCAGCCACGACACACCAGCTTCCCGTG 
GCCGAAGCCATGCTGACTTGCCGGCATAAGTTCCCTGATGAAGACTCCTATCAGAAGTTTATT 
CCCTTCATTGGCGTGGTGAAGGTGGGTCTGGTTGAAGACTCTCCCTCCACAGCAGGCGATGGG 
GACGATTCTCCTGTGGTCAGCCTTACTGTGCCCTCCACATCACCACCCTCCAGCTCGGGCCTG 
AGCCGAGACGCCACGGCCACtCCTCCCTCCTCCCCATCTATGAGCAGCGCCCTGGCCATCGTG 
GGG AGC CCT AAT AGC C C AT ATGGGG ACGTG ATTG G C C T C C AG G TG G ACT ACTG G C TGG G C C AC 
CCCGGGGAGCGGAGGAGGGAAGGCGACAAGAGGGACGCCAGCTCGAAGAACACCCTCAAGAGT 
GTCTTCCGCTCAGTGCAGGTGTCCCGCCTGCCCCATAGTGGGGAGGCCCAGCTTTCTGGCACC 
ATGGCCATGACTGTGGTCACCAAAGAAAAGAACAAGAAAGTTCCCACCATCTTCCTGAGCAAG 
AAACCCCGAGAAAAGGAGGTGGATTCTAAGAGCCAGGTCATTGAAGGCATCAGCCGCCTCATC 
TGTTCTTCCCCCTCCTTAGGCCCCAGCCTGGGCCCAGACCCATCCTCCCAGCCAGGTTTCCCT 
CCAGCAGGCTCCTTCCCTCCCTGTCACCTCCCTCTCACCAACCCGGGGTCTGAGCCCCTCATT 
CCTGACCGTCCGTGTTCTCAGGAGTGGTTGAGGACACAGGGCCCCAGCCCAGCCCTCTGCACC 
CCCCAGCCCGGCCATCTGCGCCCCACAGCCCCTTTGGAGCTTTTCTCTTGTCCTCTCACTCCT 
TCCCAGAAGTTTTTGCACAGAACTTCATTTTGAAAGTGTTTTTCTCATTCTCCATACCTCCCC 


CAAGCTCTCCTCCAGCCCTTCCCAGGGCTCAGCCCTGCTGTCCTGAGCGTCTCCTGGGCCAGA 


GAGAGGAGATGGGGGTGGGAGGGACTGAGTTGATGTTGGGTTTTTCATTCAATAAATTGGTGA 


TTTCTTACCGAC 




ORF Start: ATG at 3 1 | ORF Stop: TG A at 3055 




SEQIDNO:244 1008aa MW at 10934 1.2kD 


NOV66b, 

CGI 38208-02 Protein 

Com ipnffi 

i 

! 

i 

! 

I 

i - . . ., „ ... 


MAERGGAGGGPGGAGGGSGQRGSGVAQSPQQPPPQQQQQQPPQQPTPPKLAQATSSSSSTSAA 
AASSSSSSTSTSMAVAVASGSAPPGGPGPGRTPAPVQMNLYATWEVDRSSSSCVPRLFSLTLK 
KLVMLKEMDKDLNSWIAVKLQGSKRILRSNEIVLPASGLVETELQLTFSLQYPHFLKRDANK 
LQIMLQRRKRYKNRTILGYKTLAVGLINMAEVMQHPNEGALVLGLHSNVKDVSVPVAEIKIYS 
LSSQPIDHEGIKSKLSDRSPDIDNYSEEEEESFSSEQEGSDDPLHGQDLFYEDEDLRKVKKTR 
RKLTSTSAITRQPNIKQKFVALLKRFKVSDEVGFGLEHVSREQIREVEEDLDELYDSLEMYNP 
SDSGPEMEETESILSTPKPKLKPFFEGMSQSSSQTEIGSLNSKGSLGKDTTSPMELAALEKIK 
STWIKNQDDSLTETDTLEITDQDMFGDASTSLWPEKVKTPMKSSKTDLQGSASPSKVEGVHT 
PRQKRSTPLKERQLSKPLSERTNSSDSERSPDLGHSTQIPRKWYDQLNQILVSDAALPENVI 
LVNTTDWQGQYVAELLQDQRKPWCTCSTVEVQAVLSALLTRIQRYCNCNSSMPRPVKVAAVG 
GQSYLSSILRFFVKSLANKTSDWLGYMRFLIIPLGSHPVAKYLGSVDSKYSSSFLDSGWRDLF 
SRSEPPVSEQLDVAGRVMQYVNGAATTHQLPVAEAMLTCRHKFPDEDSYQKFIPFIGVVKVGL 
VEDSPSTAGDGDDSPWSLTVPSTSPPSSSGLSRDATATPPSSPSMSSALAIVGSPNSPYGDV 
IGLQVDYWLGHPGERRREGDKRDASSKNTLKSVFRSVQVSRLPHSGEAQLSGTMAMTWTKEK 
NKKVPTIFLSKKPREKEVDSKSQVIEGISRLICSSPSLGPSLGPDPSSQPGFPPAGSFPPCHL 
PLTNPGSEPLIPDRPCSQEWLRTQGPSPALCTPQPGHLRPTAPLELFSCPLTPSQKFLHRTSF 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 66B. 



345 



WO 03/023002 PCT/US02/28539 



Table 66B. Comparison of NOV66a against NOV66b. 


Protein Sequence 


NOV66a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV66b 


100..915 
100..915 


738/816(90%) 
738/816(90%) 



Further analysis of the NOV66a protein yielded the following properties shown in 
Table 66C. 



Table 66C. Protein Sequence Properties NOV66a 


PSort 

analysis: 


0.9700 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignaiP 
analysis: 


No Known Signal Sequence Predicted 



5 



A search of the NOV66a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 66D. 



Table 66D. Geneseq Results for NOV66a 


Gcncscq 
Identifier 


Protein/Organism/Length |Pntent #, 
Date] 


NOV66a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB42247 


Human ORFX ORF201 1 polypeptide 
sequence SEQ ID NO:4022 - Homo 
sapiens, 885 aa. [WO200058473-A2. 
05-OCT-2000] 


85..963 
7..885 


879/879(100%) 
879/879(100%) 


0.0 


AAB92606 


Human protein sequence SEQ ID 
NO: 10867 - Homo sapiens, 389 aa. 
[EPI074617-A2, 07-FEB-2001] 


620..917 
1 ..298 


296/298 (99%) 
297/298 (99%) 


e-168 


AAB57I22 


Human prostate cancer antigen protein 
sequence SEQ ID NO: 1 700 - Homo 
sapiens, 543 aa. [WO200055174-A 1, 
21-SEP-2000] 


468..963 
48..542 


289/501 (57%) 
371/501 (73%) 


e-159 


ABB6I495 


Drosophila melanogaster polypeptide 
SEQ ID NO 1 1277 - Drosophila 
melanogaster, 1 1 64 aa. 
[WO200171042-A2, 27-SEP-200I] 


59..956 
34.. 1152 


336/1140(29%) 
490/1140(42%) 


e-101 


AAB38002 








8e-09 
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gene 19 clone HCUGE72 - Homo 


65.. 102 j 32/38 (83%) 






sapiens, 104 aa. [WO200055371-A1, 








21-SEP-2000] 


i 

i 





In a BLAST search of public sequence datbases, the NOV66a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 66E. 



Table 66E. Public BLASTP Results for NOV66a 


Protein 

Accession 

Number 


Protcin/Organism/Length 


NOV66a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


088588 


Cytosolic sorting protein PACS-1 a - 
Rattus norvegicus (Rat), 96 1 aa. 


I. .963 
I. .961 


940/963 (97%) 
948/963 (97%) 


0.0 


BAC04831 


CDNA FLJ39325 fis, clone 
OCBBF2015334, highly similar to 
Rattus norvegicus cytosolic sorting 
protein PACS-1 a (PACS-1) mRNA - 
Homo sapiens (Human), 829 aa 
(fragment). 


1..879 
1..829 


825/879 (93%) 
826/879 (93%) 


0.0 


Q96MW0 


CDNA FLJ31799 fis, clone 
NT2RI2009037, highly similar to Rattus 
norvegicus cytosolic sorting protein 
PACS- 1 a (PACS- 1 ) mRNA - Homo 
sapiens (Human), 750 aa (fragment). 


130.. 879 
I. .750 


750/750 (100%) 
750/750 (100%) 


o.o : 
i 

i 

i 

i 
! 
! 


Q9ULP5 


KIAA1 175 protein - Homo sapiens 
(Human), 64 1 aa (fragment). 


323..963 
1..641 


641/641(100%) iO.O 
641/641(100%) | 


088589 


Cytosolic sorting protein PACS-1 b - 
Rattus norvegicus (Rat), 559 aa. 


1..543 
1..541 


532/543 (97%) 
535/543 (97%) 


0.0 j 

1 
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PFam analysis predicts that the NOV66a protein contains the domains shown in the 
Table 66F. 



Table 66F. Domain Analysis of NOV66a 



Pfam Domain 



NOV66a Match Region 



Identities/ 
Similarities 

for the Matched Region 



Expect Value 
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Example 67. 

The NOV67 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 67A. 



Table 67 A. NOV67 Sequence Analysis 




SEQ1DN0:245 J5634 bp 




NOV67a, 

CGI 38303-01 DNA 
Sequence 


CGGGATTGCACCATGGGGAACCAGGATGGGAAGCTGAAGAGGAGCGCAGGTGATGCTTTGCAC 

GAAGGCGGCGGTGGCGCCGAGGATGCGCTGGGGCCCAGGGATGTGGAAGCCACAAAGAAGGGG 

AGCGGGGGCAAGAAGGCGCTAGGCAAGCACGGCAAGGGGGGAGGGGGCGGCGGCGGCGGCGGG 

GAGTCGGGCAAGAAGAAGAGCAAGTCCGACTCCAGAGCCTCGGTGTTTTCCAACCTGCGGATC 

AGG AAG AATC TGT C C AAGGGG AAAGG CG C CGG CG G CT CC CG CG AAG ATGT A CT GG ATTC C C AG 

GCCCTGCAGACCGGGGAGCTGGACAGCGCTCACTCCCTGCTCACCAAGACTCCAGACCTCAGC 

CTCTCGGCGGACGAGGCCGGCCTGTCGGATACCGAGTGTGCGGACCCTTTTGAGGTGACCGGT 

CCAGGGGGTCCTGGGCCTGCCGAGGCTAGGGTCGGGGGCCGGCCGATCGCCGAGGATGTGGAA 

ACTGCAGCAGGGGCGCAGGATGGACAAAGGACCAGCTCGGGCTCGGACACGGACATCTATAGC 

TTCCATTCGGCTACGGAGCAAGAGGATTTGCTTTCAGACATCCAGCAGGCGATCCGCCTGCAG 

CAGCAGCAGCAGCAGCAGCTCCAGCTCCAGCTCCAGCAACAGCAGCAGCAGCAGCAGCTCCAG 

GGCGCCGAGGAGCCTGCAGCGCCCCCCACTGCCGTCTCCCCTCAGCCCGGGGCCTTCCTGGGC 

CTGGACCGGTTCCTGCTGGGGCCGGTCTCCGAGGCGCCCAGTCTCCCGGCAGCGCAACCCGCG 

GCCAAAGACTCGCCCTCCTCCACGGCTTTCCCATTTCCCGAGGCCGGGCCGGGGGAGGAAGCG 

GCCGGAGCCCCCGTGCGAGGGGCTGGGGACACGGATGAGGAGGGTGAGGAGGATGCTTTTGAG 

GATGCGCCCCGGGGCTCTCCGGGGGAGGAGTGGGCCCCGGAGGTGGGAGAGGACGCCCCGCAG 

AGGCTGGGGGAAGAGCCGGAGGAGGAGGCGCAAGGACCTGACGCCCCCGCGGCCGCTTCCCTG 

CCCGGCAGCCCCGCGCCTAGCCAGCGCTGTTTCAAGCCCTACCCGCTCATCACCCCCTGCTAC 

AT C AAG AC CACCAC C CGG C AG CT CAG C TCG C CC AAT C ACTC C CCGTCTC AG TC C C C T AATC AG 

AGCCCCAGGATCAAGAGGCGGCCGGAACCCTCCCTGAGCCGAGGGTCCAGAACTGCCCTGGCC 

TCCGTAGCCGCCCCGGCCAAGAAGCACCGGGCCGACGGCGGCCTTGCGGCCGGCCTGAGCCGC 

TCGGCTGACTGGACGGAGGAGCTAGGCGCCCGCACGCCCCGGGTGGGAGGCTCCGCGCACCTG 

CTGGAGCGCGGGGTGGCGAGTGACAGCGGCGGTGGGGTGTCCCCAGCACTGGCCGCCAAGGCG 

TCTGGGGCCCCCGCGGCTGCGGATGGCTTCCAGAACGTGTTCACAGGTGAGCGCGGGCGAACG 

CTGTTGG AG AAGCTG TTC AG C CAGC AGG AG AACGGGC C TC CAG AAG AAG CAG AG AAG T TT TG C 

TCCCGGATCATTGCCATGGGTCTTCTCCTTCCTTTTAGTGATTGCTTCAGGGAACCGTGTAAT 

CAGAATGCCCAAGATCAACTTTATACCTGGGCTGCAGTTAGTCAACCCACACACTCATTGGAC 

TATTCAGAAGGGCAGTTTCCTAGGCGAGTTCCATCCATGGGGCCACCATCCAAACCTCCCGAT 

GAGGAACACAGGCTCGAGGATGCTGAAACAGAATCTCAATCTGCTGTTTCAGAAACTCCCCAA 

AAACG CTC AGATGCTGT C CAG C AGG AAGT TG TTG AC ATG AAGT C TGAGGG AC AGG C C ACTGT A 

Al 1 u AIjL.AVjv_ J.\jQjAAL,A<jA\.IAI IwvaoAI L IoAvjAACCAAAATAGCTGAACTAGAGAGGCAG 

TATCCTGCCCTGGACACAGAGGTGGCCAGTGGTCATCAAGGGCTTGAGAATGGAGTGACAGCC 

TCAGGCGATGTCTGTCTCGAAGCTCTCAGGTTAGAAGAAAAGGAAGTACGGCATCATAGGATT 

TTAGAGGCGAAATCGATACAGACTTCCCCCACGGAAGAGGGCGGGGTGCTGACACTGCCTCCT 

GTGGATGGGCTGCCAGGGCGTCCTCCATGCCCCCCTGGGGCTGAAAGTGGACCTCAGACAAAG 

TTCTGTTCAGAGATTTCTTTGATTGTGTCTCCAAGGCGAATATCAGTCCAGCTCGACAGCCAT 

CAGCCCACACAGAGCATCTCACAGCCTCCACCACCTCCATCCCTTCTGTGGTCTGCTGGGCAA 

GGACAGCCTGGGTCACAGCCGCCCCATTCTATTTCTACCGAGTTTCAAACCAGCCACGAACAC 

TCTGTTTCCTCTGCCTTTAAAAACAGCTGTAACATCCCATCTCCACCACCTCTGCCTTGCACA 

GAGTCCTCCAGCTCCATGCCTGGCCTGGGCATGGTGCCTCCCCCACCTCCCCCTCTCCCTGGC 

ATGACAGTGCCTACTCTGCCCAGTACAGCCATTCCCCAACCTCCTCCTCTGCAGGGTACAGAA 

ATGCTGCCACCCCCTCCCCCTCCTCTTCCCGGAGCGGGCATACCTCCTCCGCCGCCTCTACCC 

GGAGCAGGCATACTCCCTCTGGGAGCGGGCATACCCCCACCTCCCCCTCTACCCGGAGCGGGC 

ATACCCCCTCCGCCCCCTCTACCCGGAGTGGGCATACCTCCTCCGCCCCCTCTACCCGGAGCG 

GGCATACCCCCTCCTCCCCCTCTACCCGGAGCGGGCATACCCCCTCCTCCCCCTCTTCCCGGA 

GCGGGCATACCTCCTCCACCCCCTCTACCCAGAGTGGGCATACCCCCTCCGCCCCCACTTCCC 

GGAGCGGGCATACCCCCACCTCCCCCTCTACCCGGAGCGGGCATACCCCCTCCGCCCCCTCTA 

CCTGGAGTGGGAATACCTCCTCCGCCCCCTCTACCTGGAGTGGGAATACCTCCTCCGCCCCCT 

CTACCTGGTGCTGGGATTCCCCCACCTCCTCCCTTGCCAGGTATGGGGATTCCACCTGCTCCA 

GCTCCCCCACTCCCTCCACCTGGGACAGGAATCCCACCGCCCCCTCTGCTTCCTGTATCAGGC 

CCTCCACTCCTCCCACAAGTTGGGAGTAGCACTTTACCAACCCCACAGGTGTGTGGATTTCTT 

CCTCCTCCATTGCCAAGTGGCTTGTTTGGATTAGGG ATG AATCAGG AC AAAGG GAGT AGG AAG 

CAGCCCATAGAGCCTTGTCGACCAATGAAGCCTCTTTACTGGACCAGGATTCAACTACATAGT 

AAAAGAGACTCCAGTACTTCACTTATTTGGGAAAAAATTGAAGAGCCATCCATAGATTGTCAT 

GAATTTGAGGAATTATTTTCTAAAACTGCTGTAAAGGAGAGAAAGAAACCTATCTCTGATACT 
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ATCTCAAAGACGAAGGCTAAACAACAGGTTGTCAAGTTATTAAGCAACAAAAGATCACAAGCA 
GTTGG AAT ACT AATG TCT AGCCTT C ATT T AG AT ATG AAAG AC AT AC AAC ATG CTGT TG TG AAC 
TTGGATAATTCTGTGGTTGACCTGGAGACCCTTCAAGCTCTCTATGAGAATAGAGCACAGTCA 
GACGAACTCGAAAAAATAGAAAAGCATGGCCGATCTTCCAAAGACAAGGAAAATGCCAAGTCT 
CTGGACAAACCTGAACAGTTCCTTTATGAACTGTCACTAATCCCCAACTTTTCAGAGCGAGTC 
TTTTGCATCCTGTTCCAGTCCACATTTTCAGAAAGCATTTGCTCAATTCGTCGCAAACTGGAA 
TTACTACAGAAATTGTGTACATTAAAAAATGGCCCAGGGGTTATGCAGGTTCTAGGTTTGGTT 
CTTGCCTTTGGCAACTACATGAATGGAGGAAATAAGACTCGAGGACAGGCAGATGGCTTTGGA 
TTAGACATTCTTCCAAAACTGAAAGATGTCAAGAGCAGTGACAATAGCAGAAGCCTTTTGTCA 
T AT ATTGTTTCGT AT T AT CTC CG AAATTTTG ATG AGG ATGC TGGAAAAG AAC AG T G CC TCT TT 
CC ACTGC C AGAAC C C C AG G AC CTTTTTC AGG CCTC AC AG ATG AAGTTTG AAG A TT TTC AAAAA 
GATCTCAGAAAACTGAAGAAAGACTTGAAAGCCTGTGAAGTTGAAGCAGGGAAAGTATACCAG 
GTCTC CT CAAAAG AG C AT ATG C AGC CTTT C AAGG AAAAC ATGG AAC AATT T AT T ATTC AAG C C 
AAAATTGACCAAGAGGCAGAGGAAAATTCACTGACAGAGACTCATAAATGCTTTTTGGAGACC 
ACGGCATATTTCTTCATGAAACCAAAACTTGGAGAGAAGGAGGTGTCCCCAAATGCTTTCTTC 
AGTATCTGGCATGAATTCAGCTCTGACTTTAAAGACTTCTGGAAGAAAGAGAACAAACTTCTT 
CTACAAGAGAGAGTAAAAGAAGCCGAAGAGGTGTGTAGACAGAAGAAAGGAAAATCACTTTAT 
AAAATAAAACCAAGACATGACTCTGG AATT AAAGC AAAG ATAAGCATGAAAAC T TG AAC AATG 
AAAAG C AG AATGAAAATG AGT CATTG CAACG ACTTTC ACAAAATTC AGCTG AC CTG AG AGTGG 


GAGGGAAACTACCGTCATTCTGCTCATGTTTCTTCTTGACCTCTTGCATAATCTTTTTGTTTT 


CTAGACAGTTCACTAATTGTTGAATTTTACTGTATATTCATATAAAAATGCAAACGTACTAGA 


CCAGTGGAGAATTTGACACCTTTTCTTTTTGTAAAAGTTTATGGTATTATACCGATAGACCAA 


AACAGCATGTGTAAGAGGCAGTATCTGCACTAATTCTCAACATGCTAAACATTAACTACAATT 


CACTGTTGTGAGAATATTCCTCfiTC APACJPA A A A AO ArTTTfTTTTrTi rTr araa rr TmTrr 


TC C ACAT C AC AG C ATT T AG AC AT ATGGGT AAAATGTT ATTT CT AG TG AAT TGTTTGT AT C AGT 


TTCATGTCTAAGTATAAATTTTCTATTTTAAAATTTAAGAACCGTTTATAATCAGTGCTTTCC 


CAACTCTTGGGTTGCTCTCCATAACTATGTATTTGTGAAAGAAAATGGTCATTTTTTTTACTG 


AAGTCATATAATGACTTGGGTCAGCTCGTAATGCATTGTGATGGTTTTGTATGAGCTGGGTGT 


TTTTTTCCATTACTTTTAATGATCTTCGTTGCAAGTTATAGTTGTGGATAAAGGGGAGAATTT 


ATTGCTCTTGCAAJVCCAATTATGGAAAGCAACTTAAGAAJ^CCAATGTTCTAAATCATAATTG 


TTTGTATTTATGTAAAGTATGGTCTCTTACTTTTTAGTTTGTAGTTTAAGTGCAAAGAAACAG 


TAGTGGTTTTTTTTCTATTGTTTTGTAGTCTTCCTGTCCCCTTCAGTCCCTCCAGTGTGTATA 


TTACCATTCTCCAATGAAATAATAGGGCATTTAACAAAGATCGCTATGTGCAATACTGTATTT 


AGTGTTTCTATTTCAATTTTTCTAGGATGTTAATTTATATGAAJ^TAAAATGAATAATAAAAG 


AATAAAGATAAAAAAAAAAAAAAAAAA 




ORF Start: ATG at 13 jORF Stop: TGA at 4591 




SEQIDNO:246 ]l526aa jMW at 162002.3kD 


NOV67a ? 

CGI 38303-01 Protein 
Sequence 


MGNQDGKLKRSAGDALHEGGGGAEDALGPRDVEATKKGSGGKKALGKHGKGGGGGGGGGESGK 
KKSKSDSRASVFSNLRIRKNLSKGKGAGGSREDVLDSQALQTGELDSAHSLLTKTPDLSLSAD 
EAGLSDTECADPFEVTGPGGPGPAEARVGGRPIAEDVETAAGAQDGQRTSSGSDTDIYSFHSA 
TEQEDLLSDIQQAIRLQOQQQQQLQLQLQQQQQQQQLQGAEEPAAPPTAVSPQPGAFLGLDRF 
LLGPVSEAPSLPAAQPAAKDSPSSTAFPFPEAGPGEEAAGAPVRGAGDTDEEGEEDAFEDAPR 
GSPGEEWAPEVGEDAPQRLGEEPEEEAQGPDAPAAASLPGSPAPSQRCFKPYPLITPCYIKTT 
TRQLSSPNHSPSQSPNQSPRIKRRPEPSLSRGSRTALASVAAPAKKHRADGGLAAGLSRSADW 
TEELGARTPRVGGSAHLLERGVASDSGGGVSPALAAKASGAPAAADGFQNVFTGERGRTLLEK 
LFSQQENGPPEEAEKFCSRIIAMGLLLPFSDCFREPCNQNAQDQLYTWAAVSQPTHSLDYSEG 
QFPRRVPSMGPPSKPPDEEHRLEDAETESQSAVSETPQKRSDAVQQEWDMKSEGQATVIQQL 
EQTIEDLRTKIAELERQYPALDTEVASGHQGLENGVTASGDVCLEALRLEEKEVRHHRILEAK 
SIQTSPTEEGGVLTLPPVDGLPGRPPCPPGAESGPQTKFCSEISLIVSPRRISVQLDSHQPTQ 
SISQPPPPPSLLWSAGQGQPGSQPPHSISTEFQTSHEHSVSSAFKNSCNIPSPPPLPCTESSS 
SMPGLGMVPPPPPPLPGMTVPTLPSTAIPQPPPLQGTEMLPPPPPPLPGAGIPPPPPLPGAGI 
LPLGAGIPPPPPLPGAGIPPPPPLPGVGIPPPPPLPGAGIPPPPPLPGAGIPPPPPLPGAGIP 
PPPPLPRVGIPPPPPLPGAGIPPPPPLPGAGIPPPPPLPGVGIPPPPPLPGVGIPPPPPLPGA 
GIPPPPPLPGMGIPPAPAPPLPPPGTGIPPPPLLPVSGPPLLPQVGSSTLPTPQVCGFLPPPL 
PSGLFGLGMNQDKGSRKQPIEPCRPMKPLYWTRIQLHSKRDSSTSLIWEKIEEPSIDCHEFEE 
LFSKTAVKERKKPISDTISKTKAKQQWKLLSNKRSQAVGILMSSLHLDMKDIQHAWNLDNS 
WDLETLQALYENRAQSDELEKIEKHGRSSKDKENAKSLDKPEQFLYELSLI PNFSERVFCIL 
FQSTFSESICSIRRKLELLQKLCTLKNGPGVMQVLGLVLAFGNYMNGGNKTRGQADG FGLDIL 
PKLKDVKSSDNSRSLLSYIVSYYLRNFDEDAGKEQCLFPLPEPQDLFQASQMKFEDFQKDLRK 
LKKDLKACEVEAGKVYQVSSKEHMQPFKENMEQFIIQAKIDQEAEENSLTETHKCFLETTAYF 
FMKPKLGEKEVSPNAFFSIWHEFSSDFKDFWKKENKLLLQERVKEAEEVCRQKKGKSLYKIKP 
RHDSGIKAKISMKT 
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Further analysis of the NOV67a protein yielded the following properties shown in 
Table 67B. 



j Tabic 67B. Protein Sequence Properties NOV67a 


j PSort 
| analysis: 




0.7000 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


J SignalP 
■ analysis: 


No Known Signal Sequence Predicted 



5 A search of the NOV67a protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 67C. 



r 



! Table 67 C. Geneseq Results for NOV67a 






o 


j Geneseq 

i Identifier 

j 


Protein/Organism/Length [Patent #, 
Date] 


NOV67a 
Residues/ 
Match 
Residues 


; Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 

1 

i 


AAU97678 


Human form in protein - Homo 
sapiens, 235 aa. [CN1294195-A, 09- 
MAY-200I] 


1292.. 1526 
1..235 


235/235 (100%) 
235/235(100%) 


e-135 

: 


ABB59865 


Drosophila melanogaster polypeptide 
SEQ ID NO 6387 - Drosophila 
melanogaster, 1059 aa. 
[ WO200 1 7 1 042-A2, 27-SEP-200 1 ] 


940.. 1525 
480.. 1055 


231/629 (36%) 
317/629(49%) 


le-99 


AAB99983 

1 

! 
I 


Human limb deformation protein 
fragment - Homo sapiens, 199 aa. 
[CN1281047-A, 24-JAN-2001] 


1355..I526 
28.. 199 


169/172 (98%) 
171/172 (99%) 


3e-95 


IAAB99982 
i 

1 

\— — , 


Human limb deformation protein SEQ 
ID NO: 7 - Homo sapiens, 199 aa. 
[CN 1 28 1 047-A, 24-JAN-200 1 ] 


1355.1526 
28.. 199 


169/172 (98%) 
171/172 (99%) 


3e-95 


AAW76733 

i 
| 


Mouse mDia Rho targeting protein - 
Mus sp, 1255 aa. [JPI0262680-A, 06- 
OCT-1998] 


887.. 1503 
569..1149 


233/640 (36%) 
325/640 (50%) 


le-86 



10 In a BLAST search of public sequence datbases, the NOV67a protein was found to 

have homology to the proteins shown in the BLASTP data in Table 67D. 
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. Table 67D. Public BLASTP Results for NOV67a 



i 

i 

! Protein 
1 Accession 
Number 


Protein/Organism/Length 


NOV67a 
Residues/ 
iviaten 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
value 


Q9JL04 


Formin 2 - Mus musculus (Mouse), 
1 S67 aa 

I jvj / aa, 


1..1526 

1 .. IJU / 


1268/1594 (79%) 
1 "OR/I SQ4 


0.0 


Q9NZ56 


Formin 2 - Homo sapiens (Human), 
632 aa (fragment). 


352..678 
1..332 


323/335 (96%) 
324/335 (96%) 


0.0 


Q05858 


Formin (Limb deformity protein) - 
Gallus gallus (Chicken), 1213 aa. 


940.. 1525 
647.. 1206 


313/595 (52%) 
403/595 (67%) 


e-165 


A41724 


limb deformity (Id) protein - 
chicken, 1213 aa.. 


940.. 1525 
647.. 1206 


312/595 (52%) 
402/595 (67%) 


e-165 


Q05859 


Formin 1 isoform IV (Limb 
deformity protein) - Mus musculus 
(Mouse), 1206 aa. 


955.. 1525 
642.. 1199 


296/581 (50%) 
383/581 (64%) 


e-156 



PFam analysis predicts that the NOV67a protein contains the domains shown in the 
Table 67E. 



Table 67E. Domain Analysis of NOV67a 


Pfam Domain 


NOV67a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


FH2 

i 


I087..1491 


1 26/506 (25%) 
318/506 (63%) 


5.7e-76 



5 



Example 68. 

The NOV68 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 68A. 



Table 68A. NOV68 Sequence Analysis 


i 


SEQ ID NO: 247 j5476 bp j 


J NOV68a, 

!CG 138362-01 DNA 
Sequence 


GCGGGAGCCCGCAGCGGAGGAGGGGCCGCGGGCGGGAGGCGCTGCGAGGCGCTGCTGGCCCTC 


GGGCTGCGGGAGCCGGGCTAGGACGCCGAGCGGTGACTCCCCGCCGCTCCCGGGGGCCGTGAC 


GAAAGCAAAACTATCTCTCAATATACCTCAGAAACAAAGATGTCTCCATCAAGTTTATACTCA 


CAGCAAGTGCTATGTTCTTCAATACCTTTATCGAAAAATGTGCACAGTTTTTTCAGTGCCTTC 
TGC AC AG AAG ATAAT ATTG AAC AG AGT AT CTCAT ATC TTG AT C AGG AATTG ACT ACTT TTG GT 
TTTCCTTCATTATATGAAGAATCCAAAGGTAAAGAGACAAAGAGAGAGTTAAATATAGTAGCT 
GTACTAAATTGTATGAATGAGCTGCTTGTGCTTCAGCGGAAGAACCTTCTAGCTCAGGAAAAT 
GTGGAGACACAGAATTTGAAGCTGGGAAGTGATATGGACCATCTACAGAGCTGCTACTCAAAA 
CTTAAGGAACAACTGGAAACCTCCAGGAGGGAAATGATTGGGCTTCAGGAAAGAGACAGACAG 
TTACAATGTAAGAACAGGAATTTGCATCAGCTACTAAAGAATGAGAAAGATGAGGTGCAAAAA 
TTACAAAATATCATTGCAAGTCGAGCTACTCAGTATAATCATGATATGAAGAGAAAAGAGCGT 
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GAATATAATAAACTGAAGGAACGTCTACATCAACTTGTTATGAACAAGAAAGATAAGAAAATA 
GCTATGGACATTTTGAATTATGTCGGGAGAGCTGATGGAAAAAGAGGCTCCTGGAGGACTGGT 
AAAACTGAAGCCAGGAATGAAGATGAAATGTATAAAATTCTCTTGAATGATTATGAATATCGT 
CAGAAACAAATCCTAATGGAAAATGCAGAACTTAAGAAGGTTCTTCAACAAATGAAAAAGGAA 
ATGATTTCTCTTCTTTCTCCCCAAAAGAAGAAACCTAGAGAAAGAGTAGATGATAGTACAGGA 
ACTGTTATTTCCGATGTTGAAGAAGATGCCGGGGAACTAAGCAGAGAGAGTATGTGGGACCTT 
TCCTGTGAAACTGTGAGAGAGCAGCTTACAAACAGCATCAGAAAACAGTGGAGAATTTTGAAA 
AGTCATGTAGAAAAGCTTGATAACCAAGTTTCAAAGGTACACCTGGAAGGTTTTAATGATGAA 
GATGTAATCTCACGACAAGACCATGAACAAGAAACTGAAAAACTCGAGTTAGAAATTCAGCAG 
TG T AAAG AAATG ATT AAAACT CAGCAACAG CTTTT A C AGC AG C AG CTCG CT AC TG CAT AT GAT 
G ATG ATACCACT TC ACT ATTACG AG ACTG T T AT TTG TTGG AAG AAAAGG AACG TC T C AAAG AA 
GAATGGTCCCTTTTTAAAGAGCAGAAAAAGAATTTTGAGAGGGAGAGACGAAGCTTTACAGAA 
GCCGCTATTCGCCTGGGATTGGAGAGAAAGGCATTTGAAGAAGAAAGAGCCAGTTGGTTAAAG 
CAGCAGT TTCTAAATATG ACT AC CTTTG AC C AC C AG AACTC AG AAAATGTG AAAC TTT TC AG T 
GCCTTCTCAGGAAGTTCTGATTGGGACAATCTTATAGTGCACTCGAGGCAGCCGCAAAAGAAG 
CCTCACAGTGTGTCTAATGGGTCTCCAGTTTGCATGTCTAAACTTACTAAATCTCTTCCTGCT 
TCACC TTCCACTTCAG ACTTTTG CC AG AC ACGT T CCTG C ATAT CTG AAC AT AGT TC AATC AAT 
GTACTGAATATAACTGCTGAAGAAATTAAACCAAATCAGGTTGGAGGAGAACGTACAAATCAA 
AAATGGAGTGTGGCGTCAAGACCTGGATCACAGGAAGGTTGCTATAGTGGATGCTCCTTGAGC 
TACACAAATTCTCATGTAGAAAAAGATGACTTACCTTAG ACATGTGGACTGGAATTTTTTTCA 
TTAATGTGTTCATCAAGTTTCACATCTAAGTTGAAACAGGGTGTGTCATAAAGTCAGTTATCT 



CTAATAACTTAAGATGGTCTGAGTTGTTTGTTTGGACTTCCCTGTCTTCCCCCAAAGAGTTGA 



AATCTTAAATCTATTTAAAAGGATATAAAAGCTTTGGATATGTATTTTTAGTAACAGAAGCAT 



CTGGTTCTGTGAATAAAGGAATGTATAGATGTTTGGATGGAAACAAAAGCACTAGACTGAGTT 



TCCTCTTATAGGTATTAAAAATAGCACTTTTAGGAAACTGATTATTGTAAATGTTTAATTTTG 



TCTCAAATATAGTTGGCATTGGAAGTTTAGCCTTTACTTGAATGTATACTGTAGATTTTTAAC 



AAAGCGAGTTCTATATTTATTATGTTTAGTGTGGTTTGAAATTACCTCTTTCATATGTTTTAA 



ATAAAGTGAAATTTATGTATGTTTTGTACATAGATACACATGATTATGTTAAGAGGCTTTAAG 



ATTTAAAAGTTTCACACAACCATAAGTATAGTATTTCATGCCAGTAAAATTTTTTTAGTGGTA 



TTCTGTTTACAGATGTATTAGGACCATTGATGCATTACATTTAAGAATTCTCTTTAATACATC 



TGGGCAATAAATATTGAAAGGTATTCCATGAAGCTGAGTTCTTTAGATAATCAACACTACTAA 



CATTACATTTTTGAGATTTTTATGACATTAGATTTTTATTTTGTATATGTAGAATATTATAAT 



TTTTAAAAGGACTATTGATGATAGAAGAATAGGGGCAAGACGACAAAAGTACCTTTGAATAAA 



ACAATTTAAGAAATTGGTTTAAGATATTGGATGATAGAAGATATTTAAGATATCTAGATGGTG 



ATATTTTCCTTACAAGATGGGTACCAGTATAGTAATATCTGTATACTAACTAGGGCTTTGTAT 



TGTCAATAATTTTTTAATAATTTTTTAATGAGGTATTTACCACTGAAGAAATATGATAATATA 



AAACC ATC AAATTTTAT AATTGAGATG ATAC TCT GG AAAAAC ATGT C ATT TC A TT TT C AG AAA 



ACTCTTAAGCTCTCTTCAGTCTCTGTAATGTTTCTGATTGCATGTTTCTTCATGAAAAGTATG 



T TGTTGT TTTG AT AGTAAT AAT AATAAATGT AGG CT C AGCTCTTT CC C AG G ATTT T C AT C AAA 



AAGCTTTAAGTGCCTAACCCTGCTTGTCTCTGTACATAGAAGCCTGCACAGATCCAACCCTTG 



CTAGGTATCATAGTTTAGGCCCATTTACCTTCCCCTGTACTGGCAGTTCAGCCGCTTACATGC 



ACTCACCCTGTTTGTGGCTATTTTAAATTCATATTATTAAAAAACAAAAAAAACCCACCTATT 



TGTGTTGTCCATTCACTATACGTAACTATGTAACTCTTTGGAGTTGCATAGCAGCAGCCATTT 



TTTCAGGGCTGATAGGATATCATTAAGAGTGTCTTATGAGACATTAGTGGATATACTCATAGT 



AACCATATTTATAGTTTTAAAGAGCTAGCTCTTGAGCATTAGTCACTACCTTCAGCTTGATGC 



ATGGT CACT T C TTTACT ATTT AAAAT ACT AC AC AT TG T AC AAAAT AT CG AAG AC ACT AC CAT A 



TG CT AAAGG AAGAAATCT AGCTGGG AT AT AGG ATTTTTGTTG TTTTTGTT TTT GT TTGT T TTG 



CTATTTAGCAATAACATGGTCAAAAAGGCAATCAGAAATTTTAAATACAGTTAATGGATACAT 



TTGGCAACAATTTTCCCCCGAGGGTTTTCCATGGTGTACTTTGCAAGAAATAAGCACTCTAAT 



TTTTAAAGTAAATCTCTTATTTTAGCAAATATTATATTTCATGACAATGGAGTTCTAGAAAGC 



AGCATCTGTTTTTTGTTGCGTGTTTCCATTTTAAAGTGAGTTGAGTTCTCAAATTGGAAAGAA 



AGATTCCTTGAGACGTACTTTTAAAATCTAAAGTGTGAAAGAAACAGCAGAGTAAAAGCCAGA 



CTCATTGCACCTTCAATGTCTGCATAGATCCAGAAGTTGTACATTTTACCTAACAACATCACT 



TTTGTTGAACATTCCAACTCCAGAATGATCCCCAATCACCCTAATCTCAGAATGCTGGAATGA 



TGTCTGTTGGAAAACCCAGGACTCCACACACAAAACTCCTGGGATTTTGTTTCCCATCTCTTT 



CTAGGTGTTTGCAATGTACAAATAATACAGCTGTGCTAATCTCACATTTAGCCATGATAGATG 



ATGGTTC TAG AGTGT ACTTCC ATTTGT AAG T CCT C C TG AT AAGTG C TTTCTTG T T T AT C AC T A 



TGTAAATCTGAAATATTTTGTACTTCATTTGTTTTTATCCATTCTGAATTTTCAAAGCATAAA 



AATGTCAAAGAAAAATTGAGATAAACATTTTGTTCACCATTAAATGTTTGCATCTGCAGTCTA 



TGATGAGAAACAATGAAATTATGACTAAAATTTAACAAATAGTATTTTCTTATAGAAGCAATG 



TATTAAATGATTAGATACACAGGACAGTTCACAGCAAATTGTTTACTTACATGGCATAATTGA 



AATACTGTTATTTGTAAAGAGTTCTATACTGCCTACCTTAGCTAACCAGAATTGTTCATAGAC 



TTGTGATAATATGGTGTTATGGGAAAACACATTTTGTGCGCATGATGAAGAGAGCACCGTGAA 
TCTAGTCTTGACTGCCCCTGGTGGAACATTTGAAAGCCCTTGGGACTCCAGGAGAGATGTTTT 
TGCAGCATGTATGCTAAATTTGTGAGTTAGCAAGACCCTGCTGTGATTACTATGTTGTACAGA 
TCTTCAAAGTATATTGCCTAATATGAGTCATGTTATTTCTAGATTGAATTTAAAGTAAGTATG 
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GATTATGGAAGTCTAGTAAATATACTTTTCCCTATTTTTGTCCTTTTTCAGTCTTTTGGTAAG 


CATTTATTAAGTACCTGCTGTATGTCAAGGCTTACTACTGCTTTCCAATCCTTGGTGATATGA 


AGACAGATAAGCACGCTGCCTGCTATTAGGAATCTGAGCTGAGTGGAAGGCCAACTATTAAAC 


CTACATTGTAATAAACAATGGTGGATGCTACGGACACATTCAGTCTGTTCATAAAACATGGAC 


i 
i 


TGATGGGTCTGCTCCAGGCCACATTTGTTTTTTAACAGTAACGTGGCTAGGCTTCCATTTATA 


AGTCTTAGCATTATTTCCTTTGTGAAGTACTATGTAACAGATGATTGTTTGCTAGATTTTGTT 


! 

i 
I 
1 


TT CTACAATC AAAATGTTG AC CTG C AAAG C AGTG T AGG ATTTTCTCT C CTC AAG AG CG TG T AT 


TATTATCTAGTAGAAAAAGCATTCCCAAAGACTTGGTCCATGCAGATAAGGATAATGAAATTG 


CTCACTCTAATCCTTTTTCTAAATACAGTGTTTTTCAAGCTGGATGTAAATTAGAGTGGGTGG 


ATATTGATTAAATTATTTGATTTATATGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 


1 1 1 1 " 
t 


ORF Start: ATG at 166 


(ORF Stop: TAG at 1927 




SEQ ID NO: 248 587 aa }M W at 68326.4kD 


!NOV68a, 

CG 138362-01 Protein 
Sequence 


MSPSSLYSQQVLCSSIPLSKNVHSFFSAFCTEDNIEQSISYLDQELTTFGFPSLYEESKGKET 
KRELNIVAVLNCMNELLVLQRKNLLAQENVETQNLKLGSDMDHLQSCYSKLKEQLETSRREMI 
GLQERDRQLQCKNRNLHQLLKNEKDEVQKLQNIIASRATQYNHDMKRKEREYNKLKERLHQLV 
MNKKDKKIAMDILNYVGRADGKRGSWRTGKTEARNEDEMYKILLNDYEYRQKOILMENAELKK 
VLQQMKKEMISLLSPQKKKPRERVDDSTGTVISDVEEDAGELSRESMWDLSCETVREQLTNSI 
RKQWRILKSHVEKLDNQVSKVHLEGFNDEDVISRQDHEQETEKLELEIQQCKEMIKTQQQLLQ 
QQLATAYDDDTTSLLRDCYLLEEKERLKEEWSLFKEQKKNFERERRS FTEAAI RLGLE, RKAFE 
EERASWLKQQFLNMTTFDHQNSENVKLFSAFSGSSDWDNLIVHSRQPQKKPHSVSNGSPVCMS 
KLTKSLPASPSTSDFCQTRSCISEHSSINVLNITAEEIKPNQVGGERTNQKWSVASRPGSQEG 
CYSGCSLSYTNSHVEKDDLP 



Further analysis of the NOV68a protein yielded the following properties shown in 
Table 68B. 



j Table 68B. Protein Sequence Properties NOV68a 


jPSort 
! analysis: 

j „ 


0.5500 probability located in endoplasmic reticulum (membrane); 0.1900 probability 
located in lysosome (lumen); 0.1800 probability located in nucleus; 0.1000 
probability located in endoplasmic reticulum (lumen) 


JsignalP 
j analysis: 


No Known Signal Sequence Predicted 



5 

A search of the NOV68a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 68C. 

i ' — — ' — » 



j Table 68C. Geneseq Results for NOV68a 



[ " 

i Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV68a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB93250 


Human protein sequence SEQ ID 
NO: 1 2265 - Homo sapiens, 4 1 7 aa. 
[EP10746I7-A2, 07-FEB-2001] 


171..587 
1 ..41 7 


415/417(99%) 
415/417(99%) 


0.0 


AAG03000 


Human secreted protein, SEQ ID NO: 
7081 - Homo sapiens, 137 aa. 
[EP1 03340 1-A2, 06-SEP-2000] 


245..381 
I..137 


137/137(100%) 
137/137(100%) 


2e-72 
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; AAG40470 

* 


Arabidopsis lhaliana protein fragment 
SEQ I D NO: 502 1 9 - Arabidopsis 
thaliana. 382 aa. [EP1033405-A2, 06- 
SEP-2000] 


! 29.381 
28..361 


89/354 (25%) 
182/354 (51%) 


2e-29 


j AAG48364 

i 

i 
i 
i 


Arabidopsis thaliana protein fragment 
SEQ ID NO: 61066 - Arabidopsis 
thaliana, 347 aa. [EP1033405-A2, 06- 
SEP-2000] 


I8..336 
13.339 


83/335 (24%) 
166/335(48%) 


2e-26 


;AAG33165 

! 


Zea mays protein fragment SEQ ID 
NO: 40 1 44 - Zea mays subsp. mays, 
291 aa. [EP1033405-A2, 06-SEP- 
2000] 


102.379 
3..266 


69/280 (24%) 
148/280 (52%) 


2e-22 



in a BLAST search of public sequence datbases, the NOV68a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 68D. 



Table 68D. Public BLASTP Results for NOV68a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV68a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


Q9UIX0 


Hypothetical 713 kDa protein - 
Homo sapiens (Human), 614 aa. 


1..587 
28..614 


587/587 (100%) 
587/587(100%) 


0.0 


Q9Y2D8 


KIAA0923 protein - Homo sapiens 
(Human), 614 aa. 


I..587 
28..614 


586/587 (99%) 
586/587 (99%) 


0.0 


Q8VC66 


Hypothetical 71.0 kDa protein - Mus 
musculus (Mouse), 615 aa. 


1..587 
28..615 


517/588 (87%) 
549/588 (92%) 


0.0 


AAH31527 


Expressed sequence AU014939 - 
Mus musculus (Mouse), 615 aa. 


1..587 
28..615 


516/588 (87%) 
549/588 (92%) 


0.0 


Q9FIE0 


Genomic DNA, chromosome 5, PI 
clone:MSF19 - Arabidopsis thaliana 
(Mouse-ear cress), 276 aa. 


29..266 
12..242 


68/239 (28%) 
132/239(54%) 


3e-25 



5 



PFam analysis predicts that the NOV68a protein contains the domains shown in the 
Table 68E. 



Table 68E. Domain Analysis of NOV68a 



Pfam Domain 



NOV68a Match Region 



Identities/ 
Similarities 

for the Matched Region 



Expect Value 



354 



WO 03/023002 




PCT/US02/28539 



Example 69. 

The NOV69 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 69A. 



|Table 69A. NOV69 Sequence Analysis 


jSEQlDNO: 249 


2254 bp 


1 


NOV69a, 

CGI 38452-01 DNA 
Sequence 


CTGGACTGGCCATTATGGAGGAGGAGCTGCAGCATTCCCATTGTGTGAATTGTGTCAGTAGAC 
GGTGCATGACCAGGCCAGAGCCAGGGATTTCCTGTGATTTGATTGGTTGTCCATTGGTTTGTG 
GTGCAGTTTTCCATTCTTGTAAAGCTGATGAGCATCGACTTTTATGTCCATTTGAACGAGTGC 
CTTGCTTAAATAGTGACTTTGGATGTCCATTTACCATGGCCCGAAATAAAGTTGCTGAACATC 
TAGAAATGTGTCCTGCAAGTGTGGTGTGCTGTACTATGGAATGGAATCGATGGCCAGTTAGTT 
ATGCAGACCGGAAATCATATGAAAATCTAAGCAGAGATGTCGATGAAGTGGCACAATTGGATA 
TGGCCTTGGCTCTTCAAGACCAAAGGATGCTCTTAGAATCCCTCAAAGTAGCCACCATGATGT 
CAAAAGCAACTGATAAAGTATCCAAACCTAGAGAACAAATCTCAGTTAAATCAAGTGTCCCAG 
AAATACCACATGCTAATGGTTTAGTGTCTGTTGATGAAGAATCTTATGGTGCACTTTATCAAG 
CTACTGTAGAAACAACCAGAAGTTTGGCTGCTGCTTTGGATATCCTGAATACTGCTACAAGAG 
ACATTGGCATGTTAAATACAAGTGTCCCAAATGACATGGATGAACAGCAAAATGCGAGAGAAA 
GCTTAG AGGATCAAAACTTG AAAG AC TAAfJ ATC ATCTTT ATCl ACld AHH A A A T ar;r £ C C a CT & r 

GTGGAATTGACTACAATGACACAAATCAGAATGCCCAGTCTGAACAAAATGGTTCAAGTGATT 
TATTATGTGACTTGAATACAAGTTCTTATGACACTTCTGCTCTTTGTAATGGCTTTCCTTTGG 
AAAATATATGTACCCAGGTCATAGACCAGAATCAGAATTTACATGGTGATTCAAAACAAAGTA 
ACTTAACAAATGGAGACTGTGTGGCATCATCAGATGGCACTTCAAAACCTTCCAGCTCACTTG 
CGGTGGC AGC AC AACTT AGGG AAAT AAT AC C ATC CAG TG CTT TG C CT AATGG C AC AG T T C AG C 
AT ATC CT CATGCC AG ATG ATG AAGGTG AAGG TG AATTGTGTTGG AAAAAAG T AG A CTT AGGGG 
ACGTGAAGAATGTGGATGTCTTATCTTTCAGTCATGCTCCTTCATTCAATTTTCTTTCTAATT 
CATGTTGGTCTAAACCAAAGGAAGATAAAGCAGTAGATACATCAGATTTGGAAGTTGCAGAAG 
ATCCTATGGGCCTCCAAGGAATAGATCTGATCACAGCAGCATTGCTTTTTTGTCTAGGAGATT 
CT CC AGG AGGG AG GGGT ATAT CTG AT AG C CG C ATGG C TG AT ATT TAT C AC ATTG A CG T TG G G A 
CTCAGACTTTTTCACTTCCATCTGCAATATTAGCTACAAGTACAATGGTTGGGGAGATAGCTT 
CAGCTTCAGCTTGTGATCATGCCAATCCACAGCTTTCAAATCCAAGTCCGTTTCAGACACTTG 
GGCTGGATTTAGTATTGGAATGTGTCGCTAGGTACCAACCCAAGCAGCGTTCAATGTTTACCT 
TTGTGTGTGGACAGTTATTTAGAAGGAAAGAATTTTCTTCCCACTTTAAGAATGTGCATGGTG 
ACATTCATGCTGGACTCAATGGCTGGATGGAACAGAGGTGCCCTTTAGCTTACTATGGTTGTA 
CCTATTCTCAGCGTAGATTTTGTCCATCAATACAAGGAGCAAAGATTATACATGACCGCCATT 
TGAGGTCATTTGGAGTTCAGCCATGTGTATCTACAGTATTAGTGGAGCCTGCTAGAAACTGTG 
TGTTGGGATTACATAATGACCATCTAAGTAGTCTTCCTTTTGAGGTCCTGCAGCATATTGCAG 
GCTTTCTCG ATGG CTTCAGCTTATGTC AG CTCTCATGTGTATCCAAGTTAATG AGGG ATGTGT 
GTGGCAGCCTGCTTCAGTCTCGTGGCATGGTCATACTGCAGTGGGGGAAAAGGAAGTATCCAG 
AAGGAAATTCATCATGGCAGATAAAAGAAAAGGTATGGCGATTTAGTACTGCATTTTGTTCTG 
TTAATGAATGGAAATTTGCTGACATCCTAAGCATGGCAGACCACTTGAAGAAATGCAGTTACA 
ATGTTGTCGAGAAACGGGAGGAAGCAATCCCTTTGCCATGTATGTGTGTGACACGAGAACTCA 
CTAAAG AAGG ACGTTC ACT ACGCTCAGTTT TAAAA C CTG T ACTT T AAAA 




ORF Start: ATG at 15 




ORF Stop: TAA at 2250 




SEQ ID NO: 250 J745 aa JMW at 82303.3kD 


NOV69a, 

CGI 3 8452-01 Protein 
Sequence 


MEEELQHSHCVNCVSRRCMTRPEPGISCDLIGCPLVCGAVFHSCKADEHRLLCPFERVPCLNS 
DFGCPFTMARNKVAEHLEMCPASWCCTMEWNRWPVSYADRKSYENLSRDVDEVAQLDMALAL 
QDQRMLLESLKVATMMSKATDKVSKPREQISVKSSVPEIPHANGLVSVDEESYGALYQATVET 
TRSLAAALDILNTATRDIGMLNTSVPNDMDEQQNARESLEDQNLKDQDHLYEEEIGAVGGIDY 
NDTNQNAQSEQNGSSDLLCDLNTSSYDTSALCNGFPLENICTQVIDQNQNLHGDSKQSNLTNG 
DCVASSDGTSKPSSSLAVAAQLREIIPSSALPNGTVQHILMPDDEGEGELCWKKVDLGDVKNV 
DVLSFSHAPSFNFLSNSCWSKPKEDKAVDTSDLEVAEDPMGLQGIDLITAALIiFCLGDSPGGR 
GISDSRMADIYHIDVGTQTFSLPSAIIxATSTMVGEIASASACDHANPQLSNPSPFQTLGLDLV 
LECVARYQPKQRSMFTFVCGQLFRRKEFSSHFKNVHGDIHAGLNGWMEQRCPLAYYGCTYSQR 
RFCPSIQGAKIIHDRHLRSFGVQPCVSTVLVEPARNCVLGLHNDHLSSLPFEVLQHIAGFLDG 
FSLCQLSCVSKLMRDVCGSLLQSRGMVILQWGKRKYPEGNSSWQIKEKVWRFSTAFCSVNEWK 
FADILSMADHLKKCSYNWEKREEAIPLPCMCVTRELTKEGRSLRSVLKPVL 
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Further analysis of the NOV69a protein yielded the following properties shown in 
Table 69B. 



Table 69B. Protein Sequence Properties NOV69a 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignaiP 
analysis: 


No Known Signal Sequence Predicted 

• 



5 A search of the NOV69a protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 69C. 



Table 69C Geneseq Results for NOV69a ^ 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV69a 
Residues/ 
Match 
Residues 


Identities/ 

Similarities Tor ! Expect 
the Matched ; Value 
Region 


AAU14823 


Novel bone marrow polypeptide #29 - 
Homo sapiens, 415 aa. 
[WO200155442-A2, 02-AUG-200I] 


133..515 
21.. 403 


383/383 (100%) 
383/383 (100%) 


0.0 


AAU14910 


Novel bone marrow polypeptide #116 

- Homo sapiens, 101 aa. 

[ WO200 1 55442-A2, 02-AUG-200 1 ] 


6S8..745 
14.. 101 


88/88(100%) |5e-47 
88/88 (100%) 

j 


AAM23608 


Human EST encoded protein SEQ ID 
NO: 1 133 - Homo sapiens, 86 aa. 
[WO2001 54477- A2, 02-AUG-2001] 


666.J33 
2..69 


23/68 (33%) 
39/68 (56%) 


6e-08 


ABB69873 


Drosophila melanogaster polypeptide 
SEQ ID NO 3641 1 - Drosophila 
melanogaster, 3232 aa. 
[ WO200 1 7 1 042- A2, 27-SEP-200 1 ] 


188..329 
1527.. 1678 


38/157 (24%) 
68/157(43%) 


0.37 


ABB7I150 


Drosophila melanogaster polypeptide 
SEQ ID NO 40242 - Drosophila 
melanogaster, 2858 aa. 
[WO200I71042-A2, 27-SEP-200I] 


I34..345 
758..944 


46/217(21%) 
84/217(38%) 


1.4 



10 In a BLAST search of public sequence datbases, the NOV69a protein was found to 

have homology to the proteins shown in the BLASTP data in Table 69D. 
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Table 69D. Public BLASTP Results for NOV69a 



Protein 

Accession 

Number 


Protein/Organism/Length 


NOV69a 

Residues/ 

Match 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


Q8TB52 


Similar toRIKEN cDNA 

1 700026A 1 6 oene - Homo saoiens 

(Human). 745 aa. 


1..745 
1..745 


745/745 (100%) 
745/745 ('100%} 


0.0 


Q9D9X5 


1 700026A 1 6Rik protein - Mus 
musculus (Mouse), 746 aa. 


1 ..745 
1..746 


653/749(87%) 
696/749 (92%) 


0.0 


Q9BXZ7 


F-box domain protein - Homo 
sapiens (Human), 390 aa. 


356..745 
1..390 


388/390 (99%) 
389/390 (99%) 


0.0 


Q9UH90 


Muscle disease-related protein - 
Homo sapiens (Human), 709 aa. 


7..733 
12..693 


254/745 (34%) 
376/745 (50%) 


e-1 11 


Q9ULM5 


KI A A 1 1 95 protein - Homo sapiens 
(Human), 71 7 aa (fragment). 


7..733 
20..70I 


251/745 (33%) 
375/745 (49%) 


e-109 



PFam analysis predicts that the NOV69a protein contains the domains shown in the 
Table 69E. 



Table 69E. Domain Analysis of NOV69a 


Pfam Domain 


NOV69a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


F-box 


611. .658 


15/48 (31%) 
34/48 (71%) 


l.5e-05 



5 



Example 70. 

The NOV70 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 70A. 



Tabic 70A. NOV70 Sequence Analysis 




SEQ ID NO: 251 


753 bp 




NOV70a, 

CGI 3878 1-01 DNA 
Sequence 


ATGCAGGACCCCAACGCAGACACTGAGTGGAATGACATCTTACGCAAAAAGGGTTTCTTACCC 
G CCAAGG AAG AATTGG AAG AATTGG AAG AGG AGG C AG AAG AGG AG C AGCG C AT CCTCC AG C AG 
TCAGTGGTGAAAACATATGAAGATATGACTTTGGAAGAGCTGGAGGATCACGAAGGCGAGTTT 
AATGAGGAGGATGAATGTGCTATTGAAATGTACAGACAGCAGAGACTGGCTGAGTGGAAAGCA 
ACTAAACTGAAGAATAAATCTGGAAAAGTTTTGGAGATCTCAGGGAAGGATTATGTTCAAGAA 
GTTACCAAAGCTGGCGAGGGCTTGTGGGTCATCTTGCACCTTTACAAACAAGGAATTCCCCTC 
TGTGCCCTGATAAATCAGCACCTCAGTGGACTTGCCAGGAAGTTTCCTGATGTCAAATTTATC 
AAAGCCATTTCAACAACCTGCATACCCAATTATACTGATAGGAATCTGCCCACGATATTTGTT 
TACCTGGAAGGAGATATCAAGGCTCAGTTTATTGGTCCTCTGGTGTTTGGCGGCATGAACCTG 
ACAAGAGATGAGTTGGAATGGAAACTGTCTGAATCTGGAGCAATTACGACAGACCTGGAGGAA 
AACCCTAAGAAGCCGATTGAAGACGTGTTGCTCTCCTCAGTGCGGCGCTCTGTCCTCATGAAG 
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AGGGACAGCGATTCCAAGGGTGACTGAGGCTACAGCTGCTATCCCATGCCGAACTTTCTT 




ORF Start: ATG at 1 j jORF Stop: TGA at 718 




SEQ ID NO: 252 J239 aa |MW at 27409.8kD 


NOV70a, 

CG138781-01 Protein 
'Sequence 


MQDPNADTEWNDILRKKGFLPAKEELEELEEEAEEEQRILQQSWKTYEDMTLEELEDHEGEF 
NEEDECAIEMYRQQRLAEWKATKLKNKSGKVLEISGKDYVQEVTKAGEGLWVILHLYKQGIPL 
CALINQHLSGLARKFPDVKFIKAISTTCIPNYTDRNLPTI FVYLEGDIKAQFIGPLVFGGMNL 
TRDELEWKLSESGAITTDLEENPKKPIEDVLLSSVRRSVLMKRDSDSKGD 



Further analysis of the NOV70a protein yielded the following properties shown in 
Table 70B. 



Table 70B. Protein Sequence Properties NOV70a 


PSort 
analysis: 


0.6500 probability located in cytoplasm; 0.1000 probability located in mitochondrial 
matrix space; 0.1000 probability located in lysosome (lumen); 0.0000 probability 
located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



5 



A search of the NOV70a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 70C. 



Table 70C. Geneseq Results for NOV70a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV70a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB68507 


Human GTP-binding associated protein 
#7 - Homo sapiens, 239 aa. 
[WO2001 05970- A2, 25-JAN-200I] 


I. .239 
1..239 


226/239 (94%) 
231/239 (96%) 


e-128 


AAE0200I 


Human viral I A P- associated factor 
(VIAF) - Homo sapiens, 239 aa. 
[WO200134798-A1, 17-MAY-2001] 


1..239 
1..239 


226/239 (94%) 
231/239 (96%) 


e-128 


AAU27979 


Mouse contig polypeptide sequence 
#132 - Mus musculus, 243 aa. 
[WO200164834-A2, 07-SEP-2001] 


I..239 
5.-243 


226/239 (94%) 
231/239 (96%) 


e-128 


AAU27807 


Human full-length polypeptide 
sequence #132 - Mus musculus, 239 aa. 
[WO2001 64834- A2, 07-SEP-2001] 


1..239 
I..239 


226/239 (94%) 
231/239 (96%) 


e-128 


AAB43903 


Human cancer associated protein 
sequence SEQ ID NO: 1 348 - Homo 
sapiens, 243 aa. [WO200055350-A1 , 
2I-SEP-2000] 


I..239 
5..243 


226/239 (94%) 
231/239 (96%) 


e-128 
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\ 

In a BLAST search of public sequence datbases, the NOV70a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 70D. 



Table 70D. Public BLASTP Results for NOV70a 


Protein 

Accession 

Number 


i^roiein/vyrg<inibiTi/ijengiii 


NOV70a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q9H2J4 


HTPHLP (Unknown) (Protein for 
MGC:3062) - Homo sapiens 
(Human), 239 aa. 


1..239 
I. .239 


226/239 (94%) 
231/239(96%) 


e-128 


CAC40344 


Sequence 3 from Patent WOO 134798 
- Mus musculus (Mouse), 240 aa. 


1..239 
1..240 


207/240 (86%) 
223/240 (92%) 


e-M6 


Q99JX2 


Similar to RIKEN cDNA 

1 1 10061 A 19 gene - Mus musculus 

(Mouse), 240 aa. 


1..239 
1..240 


206/240 (85%) 
222/240 (91%) 


e-115 


Q9D0W3 


1 1 10061 A19Rik protein - Mus 
musculus (Mouse), 239 aa. 


1 ..239 
1..239 


207/240 (86%) 
222/240 (92%) 


e-1 14 


CAC40345 


Sequence 5 from Patent WO0 134798 
- Brachydanio rerio (Zebraflsh) 
(Zebra dan io), 239 aa. 


1..239 
1 ..237 


170/239(71%) 
203/239 (84%) 


2e-97 



5 

PFam analysis predicts that the NOV70a protein contains the domains shown in the 
Table 70E. 



Table 70E. Domain Analysis of NOV70a 


Pfam Domain 


NOV70a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


Phosducin 


64.. 179 


33/121 (27%) 
58/121 (48%) 


0.36 



10 Example 71. 

The NOV7I clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 71 A. 



Table 71 A. NOV71 Sequence Analysis 



SEQ ID NO: 253 



j6765 bp 



NOV71a, 



GTTAAAATGGGCAACTCCGACAGTCAGTACACCCTTCAAGGATCTAAAAATCATAGCAATACT 
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ICG 
; Seq 



138808-01 DNA 
uence 



ATTACTGGTGCTAAGCAAATTCCTTGCTCCCTGAAAATACGTGGCATTCATGCAAAAGAGGAA 

AAGTCATTGCATGGATGGGGTCACGGAAGCAACGGAGCAGGTTACAAGTCCAGGTCCCTGGCC 

CGAAGCTGCCTTTCTCACTTTAAGAGTAACCAGCCTTACGCATCGAGACTCGGTGGCCCCACA 

TGCAAGGTCTCCAGAGGTGTTGCCTACTCCACGCACAGGACAAATGCCCCAGGGAAGGATTTC 

CAGGGCATCAGTGCTGCTTTCTCAACTGAGAATGGCTTCCACTCTGTTGGCCACGAGCTGGCA 

GATAACCACATCACCTCCAGAGACTGCAACGGACACCTTCTCAACTGCTACGGGAGGAATGAG 

AGCATTGCCTCCACCCCACCGGGCGAAGACCGCAAGAGCCCCCGAGTGCTCATCAAAACGCTG 

GGGAAGCTGGATGGGTGTTTAAGGGTCGAGTTCCACAATGGTGGCAACCCCAGCAAAGTGCCT 

GCAGAGGACTGCAGTGAGCCGGTGCAGCTGCTGAGGTACTCACCTACCTTAGCATCGGAAACC 

TCCCCTGTGCCTGAAGCCAGGAGGGGGTCCAGCGCCGATTCCCTGCCCAGCCATCGCCCCTCT 

CCCACGGACTCTCGCCTGCGGTCCAGCAAAGGCAGCTCCCTGAGTTCTGAGTCATCCTGGTAC 

GACTCCCCTTGGGGCAATGCTGGAGAGCTGAGCGAGGCTGAGGGCTCCTTCCTGGCCCCCGGC 

ATGCCTGACCCCAGTCTCCATGCCAGCTTCCCACCTGGCGATGCCAAAAAGCCTTTCAACCAA 

AGCTCTTCCCTCTCCTCCCTCCGGGAACTGTACAAAGATGCCAACCTGGGGAGCCTCTCCCCC 

TC AGGT ATCCG CC TTTCT G ATGAAT AC ATGGG C ACG C ATGCC AGCCTG AG C AACCG TG TCT CT 

TTTG CTTCCG AC ATTG ATGTG CCCTCC AG AGTGG C AC ACGGGG AC CCC AT C C AGT AC AGTT CC 

TTCACTCTCCCCTGTCGGAAGCCCAAAGCCTTTGTTGAGGATACTGCGAAGAAGGACTCCCTC 

AAAGCCAGGATGCGACGGATCAGTGACTGGACGGGAAGCCTCTCAAGGAAGAAAAGGAAACTC 

CAGGAGCCGAGGTCCAAGGAGGGCAGTGACTACTTTGACAGTCGCTCTGATGGACTGAATACA 

GATGTGCAGGGATCCTCCCAGGCATCTGCTTTTCTGTGGTCAGGGGGCTCTACTCAGATCCTG 

TCTCAGAGAAGTGAATCCACACATGCGATTGGCAGCGATCCCCTCCGGCAGAACATTTATGAG 

AATTTCATGCGAGAGTTGGAAATGAGCAGGACCAACACTGAGAACATAGAAACATCTACAGAA 

ACCGCCGAGTCCAGCAGCGAGTCACTCAGCTCTCTGGAACAGCTGGATCTGCTCTTTGAGAAG 

GAACAGGGGGTGGTCCGGAAGGCCGGGTGGCTCTTCTTCAAGCCCCTGGTCACTGTGCAGAAG 

GAAAGGAAGCTTGAGCTGGTGGCACGAAGGAAATGGAAACAGTACTGGGTAACGCTGAAAGGT 

TGCACGCTGCTGTTTTATGAGACCTATGGGAAGAATTCCATGGATCAGAGCAGTGCCCCTCGG 

TGTG CTC TGTTTG C AGAAG AC AG C AT AGTG C AGT CTG TT C C AG AG C ATCC C AAG AAAG AAAAT 

GTGTTCTGCCTC AGC AACTC CTTTGG AG ATG TCT ACCTT TTC C AGGC CAC CAG CC AG A C AG AT 

CTAGAAAACTGGGTCACTGCTGTACACTCTGCTTGTGCATCCCTTTTTGCAAAGAAGCATGGG 

AAAGAGGACACGCTGCGGCTGCTGAAGAACCAGACCAAAAACCTGCTTCAGAAGATAGACATG 

GACAGCAAGATGAAGAAGATGGCAGAGCTGCAGCTGTCCGTGGTGAGCGACCCAAAGAACAGG 

AAAGCCATAGAGAACCAGATCCAGCAATGGGAGCAGAATCTTGAGAAATTTCACATGGATCTG 

TTCAGGATGCGCTGCTATCTGGCCAGCCTACAAGGTGGGGAGTTACCGAACCCAAAGAGTCTC 

CTTGCAGCCGCCAGCCGCCCCTCCAAGCTGGCCCTCGGCAGGCTGGGCATCTTGTCTGTTTCC 

TCTTTCCATGCTCTGGTATGTTCTAGAGATGACTCTGCTCTCCGGAAAAGGACACTGTCACTG 

AC CC AGCG AGGG AGAAACAAG AAGGG AAT ATTTT CTTCGT T AAAAGGGCTGG AC AC AC TG G C C 

AGAAAAGGCAAGGAGAAGAGACCTTCTATAACTCAGGTGTTTGATTCAAGTGGCAGCCATGGA 

TT TTCTGG AACTC AG C T ACCT C AAAACT C CAGT AACTCC AG TG AG GT CG ATG AAC TT C TG CAT 

ATATATGGTTCAACAGTAGACGGTGTTCCCCGAGACAATACATGGGAAATCCAGACTTATGTC 

C ACTTTC AGG ACAAT C ACGG AGT T ACTGT AGGG ATC AAGC C AG AG C ACAG AG T AG AAG AT AT T 

TTGACTTTGGCATGCAAGATGAGGCAGTTGGAACCCAGCCATTATGGCCTACAGCTTCGAAAA 

TT AGT AG ATG AC AATGTTG AG TATTGC AT CCCTG C ACC AT ATGAAT AT ATG C AA C AAC AG GTT 

TATGATGAAATAGAAGTCTTTCCACTAAATGTTTATGACGTGCAGCTCACGAAGACTGGGAGT 

GTGTGTGACTTTGGGTTTGCAGTTACAGCGCAGGTGGATGAGCGTCAGCATCTCAGCCGGATA 

TTTATAAGCGACGTTCTTCCCGATGGCCTGGCGTATGGGGAAGGGCTGAGAAAGGGCAATGAG 

ATC ATGACCTTAAATGGGG AAG CTGTGTC TGATCTTG ACCT T AAG CAG ATGG AGG CCC TGTTT 

TCTGAGAAGAGCGTCGGACTCACTCTGATTGCCCGGCCTCCGGACACAAAAGCAACCCTGTGT 

ACATCCTGGTCAGACAGTGACCTGTTCTCCAGGGACCAGAAGAGTCTGCTGCCCCCTCCTAAC 

CAGTCCCAACTGCTGGAGGAATTCCTGGATAACTTTAAAAAGAATACAGCCAATGATTTCAGC 

AACGTCCCTGATATCACAACAGGTCTGAAAAGGAGTCAGACAGATGGCACTCTGGATCAGGTT 

TCC CAC AGGG AG AAAATGG AG C AG AC ATT C AGG AG TG CTG AG CAG AT C ACTGC AC TG T G CAG G 

AGTTTTAACGACAGTCAGGCCAACGGCATGGAAGGACCGCGGGAGAATCAGGATCCTCCTCCG 

AGGCCTCTGGCCCGCCACCTGTCTGATGCAGACCGCCTCCGCAAAGTCATCCAGGAGCTTGTG 

GACACAGAGAAGTCCTACGTGAAGGATTTGAGCTGCCTCTTTGAATTATACTTGGAGCCACTT 

CAGAATGAGACCTTTCTTACCCAAGATGAGATGGAGTCACTTTTTGGAAGTTTGCCAGAGATG 

CTTGAGTTTCAGAAGGTGTTTCTGGAGACCCTGGAGGATGGGATTTCAGCATCATCTGACTTT 

AACACCCTAGAAACCCCCTCACAGTTTAGAAAATTACTGTTTTCCCTTGGAGGCTCTTTCCTT 

T ATT ACGCGG ACCACTTT AAACTG T AC AG TGG AT T CTGTG CT AAC C AT AT C AAAG T AC AG AAG 

GTTCTGGAGCGAGCTAAAACTGACAAAGCCTTCAAGGCTTTTCTGGACGCCCGGAACCCCACC 

AAGCAGCATTCCTCCACGCTGGAGTCCTACCTCATCAAGCCGGTTCAGAGAGTGCTCAAGTAC 

CCGCTGCTGCTCAAGGAGCTGGTGTCCCTGACGGACCAGGAGAGCGAGGAGCACTACCACCTG 

ACGG AAG CAC TAAAGG CAATGG AG AAAGT AG CG AG CC AC ATC AATGAG ATG C AGAAG AT CT AT 

GAGGATTATGGGACCGTGTTTGACCAGCTAGTAGCTGAGCAGAGCGGAACAGAGAAGGAGGTA 

ACAGAACTTTCGATGGGAGAGCTTCTGATGCACTCTACGGTTTCCTGGTTGAATCCATTTCTG ' 

TCTCTAGGAAAAGCTAGAAAGGACCTTGAGCTCACAGTATTTGTTTTTAAGAGAGCCGTCATA 

CTGGTTTATAAAGAAAACTGCAAACTGAAAAAGAAATTGCCCTCGAATTCCCGGCCTGCACAC 
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AACTCTACTGACTTGGACCCATTTAAATTCCGCTGGTTGATCCCCATCTCCGCGCTTCAAGTC 
AGACTGGGGAATCCAGCAGGGACAGAAAATAATTCCATATGGGAACTGATCCATACGAAGTCA 
GAAATAGAAGGACGGCCAGAAACCATCTTTCAGTTGTGTTGCAGTGACAGTGAAAGCAAAACC 
AACATTGTTAAGGTGATTCGTTCTATTCTGAGGGAGAACTTCAGGCGTCACATAAAGTGTGAA 
TTACCACTGGAGAAAACGTGTAAGGATCGCCTGGTACCTCTTAAGAACCGAGTTCCTGTTTCG 
GCCAAATTAGCTTCATCCAGGTCTTTAAAAGTCCTGAAGAATTCCTCCAGCAACGAGTGGACC 
GGTGAGACTGGCAAGGGAACCTTGCTGGACTCTGACGAGGGCAGCTTGAGCAGCGGCACCCAG 
AGCAGCGGCTGCCCCACGGCTGAGGGCAGGCAGGACTCCAAGAGCACTTCTCCCGGGAAATAC 
CCACACCCCGGCTTGGCAGATTTTGCTGACAATCTCATCAAAGAGAGTGACATCCTGAGCGAT 
GAAGATGATGACCACCGTCAGACTGTGAAGCAGGGCAGCCCTACTAAAGACATCGAAATTCAG 
TTCC AG AG ACTG AGG ATTTC CG AGG ACCC AG ACG TT C AC CCCG AGG CTG AGC AG C AG C CTG G C 
CCGGAGTCGGGTGAGGGTCAGAAAGGAGGAGAGCAGCCCAAACTGGTCCGGGGGCACTTCTGC 
C C CATTAAACG AAAAGCC AAC AG C AC C AAG AGGG AC AG AGG AACTTTGCT C AAGG CGC AG AT C 
CGTCACCAGTCCCTTGACAGTCAGTCTGAAAATGCCACCATCGACCTAAATTCTGTTCTAGAG 
CGAGAATTCAGTGTCCAGAGTTTAACATCTGTTGTCAGTGAGGAGTGTTTTTATGAAACAGAG 
AGCCACGGAAAATCATAGTATGATTCAATCCAGATATGGGTTAAATTCCTCATTTTACTTTTA 
AACTGGTGGT AAAGT GG AAAT TG CAAAAAAAAAAAAAAAAAAAC TGTTC ATTCCT GGGTTTTG 


TGCAGTATACATTTTCCCACAAAATGGTTGTAAAGATTTAAGTTATTTTAATTTATTGTGGAT 


CAGAAACCTAGATGAAACTGGTCAGAATCTGTAAATTACTTAGTTTATATCCACTTTGAGCAG 


GTATCAAATGATTTAGGATCCTTAAAATTACATTCTAATAATTAAGTTATGTGGAAAAAGTAA 


GG CTGGGAAGTCGTGATTAATAGTTTTCAAAGGCCATTTTTTAAAATCCTCTGGGCATTTTCT 


TT C AG CTGTTTGT TAG T T TTTG CTT T ATT T AAAG C AT ATTT AAGTT ATTT T AATG TGG TT T AG 


GG GCAAAATGTGC AG AT ACTT C ATTTTTG T AAG AT AG AT TGT AAT AG ATG CTG TT T AT AC T AA 


ACATGTCATAACTATCTATACAGTATATATTAAAAGAAAGCTTGTACTGTATCTTATTTGATG 


ATATTTATTTTCTCTGCCAAGCTGTATAGTAAAAGGAAAATAAGTCACATCTGGTCATTGGCA 


TTTGTATCGTCATTCTGTAAAGACAAAAGAGTACCTATATAAGAAGCTCCACGTAGTGCAAAT 


CGACATCTGGTAGGCTGCTCGCCCCCAGGCAGCAGCTAGAGTCTGTAATTCTCTGCGTCATCC 


TCTTCTTTTTCTTCATTTTTGCTTTTTCTTCGCTTGAGTTCTTCTCTGAAATTATATGCAAAG 


AGTTGTGGGTCTTCATCACACATTTTTCTGTATACATCACAGAGGCTCTTAAAGTGTGAGATG 




TCCAAGCGCTGCGCTTCAGGGAATAACATTCTGAGCCCTCGATGGCAGTATTTCCTTCGGAAC 


TGAAATACATTCTGAACCACTTTTTCCACCAGCTTGAATGGCTGCTCTATCTTGGGCTGTATC 


AAGGGAGTGAAGTGCACCACGCCCACGTCCACCTTCGTTGTAAGCAAACATATTATCATTCTG 


TGGCATGATATGTGGCATAGTGTGATCAATCAACTCATCCTTGTAAAACAGGAAGATGGGCTG 


TCAACAGCCTGTTTTCATAAACAGACCTTTCCACGTACTTCGGTTTCATCTCTAGGCATGGAA 


GATGGTACATTCTGGATTCGCAAATGACATGGAGAAATCAGCCGGCTGCACCTGTTCTCTAAT 


GACATCCACCAGACCTGTGCTTGATGGTCACTTAATTTTAAAACACAGTTTCACAATGGCTTA 


AAAATCAATCCAAATCAGTAAAGTCAGTCAGCAGATAATAGATGGCATTAGAATATTTTAGTT 


TTTGAATGAGGAAAAAAATAAGCTGCAGCAGCAGCTTCAAGACACAGAGAGATGGCAGACAGG 


CCCCCAGGGACCACTCAGTGCTAAACTTCCCAGATAGAGACACCACTTATTTTCGGTAGACAC 


TGATTAATCAGCTGGACTGAATTC 


r 


ORF Start: ATG at 7 jORF Stop: TAG at 5182 




SEQIDNO:254 1725 aa MW at 192579. IkD 


NOV71a, 

CGI 3 8808-01 Protein 
Sequence 


MGNSDSQYTLQGSKNHSNTITGAKQIPCSLKIRGIHAKEEKSLHGWGHGSNGAGYKSRSLARS 
CLSHFKSNQPYASRLGGPTCKVSRGVAYSTHRTNAPGKDFQGISAAFSTENGFHSVGHELADN 
HITSRDCNGHLLNCYGRNESIASTPPGEDRKSPRVLIKTLGKLDGCLRVEFHNGGNPSKVPAE 
DCSEPVQLLRYSPTLASETSPVPEARRGSSADSLPSHRPSPTDSRLRSSKGSSLSSESSWYDS 
PWGNAGELSEAEGSFLAPGMPDPSLHASFPPGDAKKPFNQSSSLSSLRELYKDANLGSLSPSG 
IRLSDEYMGTHASLSNRVSFASDIDVPSRVAHGDPIQYSSFTLPCRKPKAFVEDTAKKDSLKA 
RMRRISDWTGSLSRKKRKLQEPRSKEGSDYFDSRSDGLNTDVQGSSQASAFLWSGGSTQILSQ 
RSESTHAIGSDPLRQNIYENFMRELEMSRTNTENIETSTETAESSSESLSSLEQLDLLFEKEQ 
GVVRKAGWLFFKPLVTVQKERKLELVARRKWKQYWVTLKGCTLLFYETYGKNSMDQSSAPRCA 
LFAEDSIVQSVPEHPKKENVFCLSNSFGDVYLFQATSQTDLENWVTAVHSACASLFAKKHGKE 
DTLRLLKNQTKNLLQKIDMDSKMKKMAELQLSWSDPKNRKAIENQIQQWEQNLEKFHMDLFR 
MRCYLASLQGGELPNPKSLLAAASRPSKLALGRLGILSVSSFHALVCSRDDSALRKRTLSLTQ 
RGRNKKGIFSSLKGLDTLARKGKEKRPSITQVFDSSGSHGFSGTQLPQNSSNSSEVDELLHIY 
GSTVDGVPRDNTWEIQTYVHFQDNHGVTVGIKPEHRVEDILTLACKMRQLEPSHYGLQLRKLV 
DDNVEYCIPAPYEYMQQQVYDEIEVFPLNVYDVQLTKTGSVCDFGFAVTAQVDERQHLSRIFI 
SDVLPDGLAYGEGLRKGNEIMTLNGEAVSDLDLKQMEALFSEKSVGLTLIARPPDTKATLCTS 
WSDSDLFSRDQKSLLPPPNQSQLLEEFLDNFKKNTANDFSNVPDITTGLKRSQTDGTLDQVSH 
RE KMEQTFRS AEQ ITALC RS FNDSQANGMEG PRENQDPP PRPLARHLSDADRLRK V IQELVDT 
EKSYVKDLSCLFELYLEPLQNETFLTQDEMESLFGSLPEMLEFQKVFLETLEDGISASSDFNT 
LETPSQFRKLLFSLGGSFLYYADHFKLYSGFCANHIKVQKVLERAKTDKAFKAFLDARNPTKQ 
HSSTLESYLIKPVQRVLKYPLLLKELVSLTDQESEEHYHLTEALKAMEKVASHINEMQKIYED 
YGTVFDQLVAEQSGTEKEVTELSMGELLMHSTVSWLNPFLSLGKARKDLELTVFVFKRAVILV 
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yKENCKLKKKLPSNSRPAHNSTDLDPFKFRWLIPISALQVRLGNPAGTENNSIWELIHTKSEl" 

EGRPETI FQLCCSDSESKTNI VKVIRSILRENFRRHIKCELPLEKTCKDRLVPLKNRVPVSAK 

LASSRSLKVLKNSSSNEWTGETGKGTLLDSDEGSLSSGTQSSGCPTAEGRQDSKSTSPGKYPH 

PGLADFADNLIKESDILSDEDDDHRQTVKQGSPTKDIEIQFQRLRISEDPDVHPEAEQQPGPE 

SGEGQKGGEQPKLVRGHFCPIKRKANSTKRDRGTLLKAQIRHQSLDSQSENATIDLNSVLERE 

FSVQSLTSWSEECFYETESHGKS 



Further analysis of the NOV7la protein yielded the following properties shown in 
Table 7 IB. 



Table 71B. Protein Sequence Properties NOV71a 


PSort 
analysis: 


0.9800 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



5 

A search of the NOV7la protein against the Geneseq database, a proprietary database 



that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 71C. 



Tabic 71 C. Geneseq Results for NOV71a 


Geneseq 
Identifier 


| NOV71a 

Protcin/Organism/Length [Patent i Residues/ 
#, Date] j Match 

j Residues 


Identities/ 
Similarities for the 
Matched Region 


i 

Expect 
Value 


AAB07497 


A T-cell lymphoma invasion and 
metastasis 2 (TIAM2) protein - Homo 
sapiens, 1077 aa. [WO200040607- 
A2, I3-JUL-2000] 


649.. 1725 
1..1077 


1075/1077 (99%) 
1076/1077 (99%) 


r ■"■ 

0.0 


AAB07496 


A T-cell lymphoma invasion and 
metastasis 2 (TIAM2) protein - Homo 
sapiens, 1053 aa. [WO200040607- 
A2, 13-JUL-20O0] 


649.. 1725 
1 ..1053 


1052/1077 (97%) 
1052/1077(97%) 


0.0 


AAU21607 


Novel human neoplastic disease 
associated polypeptide #40 - Homo 
sapiens, 7 19 aa. [WO200155I63-A1, 
02-AUG-200I] 


1007.. 1725 
1..719 


719/719(100%) 
719/719(100%) 


0.0 


AAB07495 


A T-cell lymphoma invasion and 
metastasis 2 (TIAM2) protein - Homo 
sapiens, 626 aa. [ WO200040607-A2, : 
13-JUL-2000] 


11 00.. 1725 
1..626 


626/626(100%) 
626/626(100%) 


0.0 



362 



WO 03/023002 ^^CT/US02/28539 



AAB07494 


A T-cell lymphoma invasion and 
metastasis 2 (TIAM2) protein - Homo 
sapiens, 626 aa. [ WO200040607-A2, 
13-JUL-2000] 


11 00.. 1725 
1..626 

i 

! 


626/626(100%) 
626/626(100%) 


0.0 


In a BLAST search of public sequence datbases, the NOV7Ia protein was found to 
have homology to the proteins shown in the BLASTP data in Table 7 ID. 


Table 71D. Public BLASTP Results for NOV71a j 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV71a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


Q9WVS3 


Sif and Tiam 1 -like exchange factor - 
Mus musculus (Mouse), 1715 aa. 


1 ..1725 
1 ..1715 


1456/1726(84%) 
1549/1726 (89%) 


0.0 


Q9UKW0 


T-cell lymphoma invasion and 
metastasis 2 - Homo sapiens 
(Human), 1077 aa. 


649.. 1725 
1 ..1077 


1075/1077(99%) 
1076/1077 (99%) 


0.0 


Q9UFG6 


Hypothetical 1 1 8.0 kDa protein - 
Homo sapiens (Human), 1046 aa 
(fragment). 


676.. 1725 
21.1046 


1024/1050 (97%) 
1025/1050 (97%) 


0.0 
0.0 


Q9UKV9 


T-cell lymphoma invasion and 
metastasis 2 short - Homo sapiens 
(Human), 626 aa. 


1 100.. 1725 
1 ..626 


626/626(100%) 
626/626(100%) 


Q 13009 


T-lymphoma invasion and metastasis 
inducing protein 1 (TIAM1 protein) - 
Homo sapiens (Human), 1591 aa. 


297.. 1623 
230.. 1535 


593/1353 (43%) 
814/1353 (59%) 


0.0 



5 



PFam analysis predicts that the NOV71a protein contains the domains shown in the 
Table 7 IE. 



Table 71E. Domain Analysis of NOV71a 


Pfam Domain 


NOV71a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


PH 


507..620 


26/114(23%) 
87/1 14(76%) 


I.8e-I7 


PDZ 


9I4..999 


22/89 (25%) 
59/89 (66%) 


4.8e-06 


RhoGEF 


1127.. 1316 


70/211 (33%) 
140/211 (66%) 


3.3e-40 
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Example 72. 

The NOV72 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 72A. 



jTablc 72A. NOV72 Sequence Analysis 


i 

i 


SEQIDNO:255 l 4I72b P 


|NOV72a, 

CGI 39224-01 DNA 
Sequence 

i 

i 

i 

i 
i 
i 


GCGGCCGCCCGGGCTAAGAGCGGCCGGCTGGAGCCGCTGAGCCCCCGCTGCGGCCGGGAGCTG 


CATGGGGGAGCGCCGGCAGCGCTTGGGAAGATGCCCCGGCCGGAGCTGCCCCTGCCGGAGGGC 


TGGGAGGAGGCGCGCGACTTCGACGGCAAGGTCTACTACATAGACCACACGAACCGCACCACC 

AGCTGGATCGACCCGCGGGACAGGTACACCAAACCGCTCACCTTTGCTGACTGCATTAGTGAT 

GAGTTGCCGCTAGGATGGGAAGAGGCATATGACCCACAGGTTGGAGATTACTTCATAGACCAC 

AACACCAAAACCACTCAGATTGAGGATCCTCGAGTACAATGGCGGCGGGAGCAGGAACATATG 

CTGAAGGATTACCTGGTGGTGGCCCAGGAGGCTCTGAGTGCACAAAAGGAGATCTACCAGGTG 

AAGCAGCAGCGCCTGGAGCTTGCACAGCAGGAGTACCAGCAACTGCATGCCGTCTGGGAGCAT 

AAGCTGGGCTCCCAGGTCAGCGTGGTCTCTGGTTCATCATCCAGCTCCAAGTATGACCCTGAG 

ATCCTGAAAGCTGAAATTGCCACTGCAAAATCCCGGGTAAACAAGCTGAAGAGAGAGATGGTT 

CACCTCCAGCACGAGCTGCAGTTCAAAGAGCGTGGCTTTCAGACCCTGAAGAAGATCGATAAG 

AAAATGTCTGATGCTCAGGGCAGCTACAAACTGGATGAAGCTCAGGCTGTCTTGAGAGAAACA 

AAAGCCATCAAAAAGGCTATTACCTGTGGGGAAAAGGAAAAGCAAGATCTCATTAAGAGCCTT 

GCCATGTTGAAGGACGGCTTCCGCACTGACAGGGGGTCTCACTCAGACCTGTGGTCCAGCAGC 

AGCTCTCTGGAGAGTTCGAGTTTCCCGCTACCGAAACAGTACCTGGATGTGAGCTCCCAGACA 

GACATCTCGGGAAGCTTCGGCATCAACAGCAACAATCAGTTGGCAGAGAAGGTCAGATTGCGC 

CTTCGATATGAAGAGGCTAAGAGAAGGATCGCCAACCTGAAGATCCAGCTGGCCAAGCTTGAC 

AGTG AGG CCTGG CCT GGGGTG CTGG A CT C AG AG AGGG AC CGG CTG ATC CTT AT C AACG AG AAG 

GAGGAGCTGCTGAAGGAGATGCGCTTCATCAGCCCCCGCAAGTGGACCCAGGGGGAGGTGGAG 

CAGCTGGAGATGGCCCGGAAGCGGCTGGAAAAGGACCTGCAGGCAGCCCGGGACACCCAGAGC 

AAGGC GCTG ACGG AG AGG TT AAAGTT AAACAGT AAG AGG AAC CAG CTTG TG AG AG AAC TG G AG 

GAAGCCACCCGGCAGGTGGCAACTCTGCACTCCCAGCTGAAAAGTCTCTCAAGCAGCATGCAG 

TCCCTGTCCTCAGGCAGCAGCCCCGGATCCCTCACGTCCAGCCGGGGCTCCCTGGTTGCATCC 

AGCCTGGACTCCTCCACTTCAGCCAGCTTCACTGACCTCTACTATGACCCCTTTGAGCAGCTG 

GACTCAGAGCTGCAGAGCAAGGTGGAGTTCCTGCTCCTGGAGGGGGCCACCGGCTTCCGGCCC 

TCAGGCTGCATCACCACCATCCACGAGGATGAGGTGGCCAAGACCCAGAAGGCAGAGGGAGGT 

GGCCGCCTGCAGGCTCTGCGTTCCCTGTCTGGCACCCCAAAGTCCATGACCTCCCTATCCCCA 

CGTTCCTCTCTCTCCTCCCCCTCCCCACCCTGTTCCCCTCTCATGGCTGACCCCCTCCTGGCT 

GGTGATGCCTTCCTCAACTCCTTGGAGTTTGAAGACCCGGAGCTGAGTGCCACTCTTTGTGAA 

CTGAGCCTTGGTAACAGCGCCCAGGAAAGATACCGGCTGGAGGAACCAGGAACGGAGGGCAAG 

CAGCTGGGCCAAGCTGTGAATACGGCCCAGGGGTGTGGCCTGAAAGTGGCCTGTGTCTCAGCC 

GCCGTATCGGACGAGTCAGTGGCTGGAGACAGTGGTGTGTACGAGGCTTCCGTGCAGAGACTG 

GGTGCTTCAGAAGCTGCTGCATTTGACAGTGACGAATCGGAAGCAGTGGGTGCGACCCGAATT 

CAGATTGCCCTGAAGTATGATGAGAAGAATAAGCAATTTGCAATATTAATCATCCAGCTGAGT 

AACCTTTCTGCTCTGTTG C AG C AACAAG ACC AG AAAG TG AAT AT CCG CGTGG C TGT C C TT C CT 

TGCTCTGAAAGCACAACCTGCCTGTTCCGGACCCGGCCTCTGGACGCCTCAGACACTCTAGTG 

TTCAATGAGGTGTTCTGGGTATCCATGTCCTATCCAGCCCTTCACCAGAAGACCTTAAGAGTC 

G ATGT CTGT ACCAC CG AC AGG AG CC ATCTGG AAG AGTGCCTGGG AGG CG CCC AG ATC AGC C TG 

GCGGAGGTCTGCCGGTCTGGGGAGAGGTCGACTCGCTGGTACAACCTTCTCAGCTACAAATAC 

TTGAAGAAGCAGAGCAGGGAGCTCAAGCCAGTGGGAGTTATGGCCCCTGCCTCAGGGCCTGCC 

AGCACGGACGCTGTGTCTGCTCTGTTGGAACAGACAGCAGTGGAGCTGGAGAAGAGGCAGGAG 

GGCAGGAGCAGCACACAGACACTGGAAGACAGCTGGAGGTATGAGGAGACCAGTGAGAATGAG 

GCAGTAGCCGAGGAAGAGGAGGAGGAGGTGGAGGAGGAGGAGGGAGAAGAGGATGTTTTCACC 

GAGAAAGCCTCACCTGATATGGATGGGTACCCAGCATTAAAGGTGGACAAAGAGACCAACACG 

GAGACCCCGGCCCCATCCCCCACAGTGGTGCGACCTAAGGACCGGAGAGTGGGCACCCCGTCC 

CAGGGGCCATTTCTTCGAGGGAGCACCATCATCCGCTCTAAGACCTTCTCCCCAGGACCCCAG 

AGCCAGTACGTGTGCCGGCTGAATCGGAGTGATAGTGACAGCTCCACTCTGTCCAAAAAGCCA 

CCTTTTGTTCGAAACTC CCTGG AG CG ACGC AG CG TCCGG ATG AAG CG GCC TTC C T CGG TC AAG 

TCGCTGCGCTCCGAGCGTCTGATCCGTACCTCGCTGGACCTGGAGTTAGACCTGCAGGCGACA 

AG AACCTGGC AC AG CC AAC TG ACCC AGG AG AT CTCGGTG CTG AAGG AGCTC AAGG AG C AG C TG 

GAACAAGCCAAGAGCCACGGGGAGAAGGAGCTGCCACAGTGGTTGCGTGAGGACGAGCGTTTC 

CG CCTG CTG CTG AGG ATGCTGG AG AAG CGG ATGGACCGAGCGG AG CACAAGGG TGAGCTT CAG 

ACAGACAAGATGATGAGGGCAGCTGCCAAGGATGTGCACAGGCTCCGAGGCCAGAGCTGTAAG 

G AACCCCC AG AAG TTC AGTCTTT C AGGGAG AAG ATGGC ATTTTT C AC CCGGC CT CGG ATG AAT 

ATCCCAGCTCTCTCTGCAGATGACGTCTAATCGCCAGAAAAGTATTTCCTTTGTTCCACTGAC 
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C AGG CTG TG AAC ATTG AC TGTGG CT AAAG TT ATT T ATG TGGTGTT AT ATG AAGGT AC TG AG TC 


ACAAGTCCTCTAGTGCTCTTGTTGGTTTGAAGATGAACCGACTTTTTAGTTTGGGTCCTACTG 


TTG TT AT T AAAAAC AG AAC AAAAAC AAAACAC AC AC AC AC AC AAAAAC AG AAAC AAAAAAAA C 


CAGCATTAAAATAATAAGATTGTATAGTTTGTATATTTAGGAGTGTATTTTTGGGAAAGAAAA 


TTTAAATGAACTAAAGCAGTATTGAGTTGCTGCTCTTCTTAAAATCGTTTAGATTTTTTTTGG 


TTTGTACAGCTCCACCTTTTAGAGGTCTTACTGCAATAAGAAGTAATGCCTGGGGGACGGTAA 


TCCTAATAGGACGTCCCGCACTTGTCACAGTACAGCTAATTTTTCCTAGTTAACATATTTTGT 


ACAATATTAAAAAAATGCACAGAAACCATTGGGGGGGATTCAGAGGTGCATCCACGGATCTTC 


TTGAGCTGTGACGTGTTTTTATGTGGCTGCCCAACGTGGAGCGGGCAGTGTGATAGGCTGGGT 


GGGCTAAGCAGCCTAGTCTATGTGGGTGACAGGCCACGCTGGTCTCAGATGCCCAGTGAAGCC 


ACTAACATGAGTGAGGGGAGGGCTGTGGGGAACTCCATTCAGTTTTATCTCCATCAATAAAGT 


GGCCTTTCAAAAAG 




ORF Start: ATG at 94 J JORF Stop: TAA at 3430 




SEQ ID NO: 256 1 1 12 aa JmW at 125l57.1kD 


NOV72a, 

CGI 39224-01 Protein 
Sequence 


MPRPELPLPEGWEEARDFDGKVYYIDHTNRTTSWIDPRDRYTKPLTFADCISDELPLGWEEAY 
DPQVGDYFIDHNTKTTQIEDPRVQWRREQEHMLKDYLWAQEALSAQKEIYQVKQQRLELAQQ 
EYQQLHAVWEHKLGSQVSWSGSSSSSKYDPEILKAEIATAKSRVNKLKREMVHLQHELQFKE 
RGFQTLKKIDKKMSDAQGSYKLDEAQAVLRETKAIKKAITCGEKEKQDLIKSLAMLKDGFRTD 
RGSHSDLWSSSSSLESSSFPLPKQYLDVSSQTDISGSFGINSNNQLAEKVRLRLRYEI^AKRRI 
ANLKIQLAKLDSEAWPGVLDSERDRLILINEKEELLKEMRFISPRKWTQGEVEQLEMARKRLE 

T.TfifiRGSLVASSLDSSTSASFTDLYYDPFEOLDSELO^KVEFLt.L.EGATGFRPSGrTTTTHFn 

XJ X O w IV\JiJ XJ V AO O XJ JL> O <J JL I X JU/ XJ X X XV *r (7 UUtJ **• UyOI\ V SZ» v XJ Xj Xj Cjw/t. X w i J\t i^Vj^« «L, X X J. fl dtLs 

FVAKTOKAEGGGRLOAIJiSLSGTPKSMTSLSPRSSLSQpep 

EDPELSATLCELSLGNSAQERYRLEEPGTEGKQLGQAVNTAQGCGLKVACVSAAVSDESVAGD 
SGVYEASVQRLGASEAAAFDSDESEAVGATRIQIALKYDEKNKQFAILIIQLSNLSALLQQQD 
QKVNIRVAVLPCSESTTCLFRTRPLDASDTLVFNEVFWVSMSYPALHQKTLRVDVCTTDRSHL 
EECLGGAQISLAEVCRSGERSTRWYNLLSYKYLKKQSRELKPVGVMAPASGPASTDAVSALLE 
QTAVELEKRQEGRSSTQTLEDSWRYEETSENEAVAEEEEEEVEEEEGEEDVFTEKASPDMDGY 
PALKVDKETNTETPAPSPTWRPKDRRVGTPSQGPFLRGSTIIRSKTFSPGPQSQYVCRLNRS 
DSDSSTLSKKPPFVRNSLERRSVRMKRPSSVKSLRSERLIRTSLDLELDLQATRTWHSQLTQE 
ISVLKELKEQLEQAKSHGEKELPQWLREDERFRLLLRMLEKRMDRAEHKGELQTDKMMRAAAK 
DVHRLRGQSCKEPPEVQSFREKMAFFTRPRMNIPALSADDV 




SEQ ID NO: 257 |3062 bp { 


NOV72b, 

CGI 39224-02 DNA 
Sequence 

i 

i 

* 
1 

1 

! 


GCGGCCGCCCGGGCTAAGAGCGGCCGGCTGGAGCCGCTGAGCCCCCGCTGCGGCCGGGAGCTG 


CATGGGGGAGCGCCGGCAGCGCTTGGGAAGATGCCCCGGCCGGAGCTGCCCCTGCCGGAGGGC 


TGGG AGG AGGCGCGCG ACTTCG A CGG C AAGG TCT ACT AC AT AG AC C AC ACG AAC CG C AC C AC C 
AGCTGGATCGACCCGCGGGACAGGTACACCAAACCGCTCACCTTTGCTGACTGCATTAGTGAT 
GAGTTGCCGCTAGGATGGGAAGAGGCATATGACCCACAGGTTGGAGATTACTTCATAGACCAC 
AAC AC C AAAACC ACT C AG ATTG AGG ATC CTC G AGT AC AATGG CGG CG GG AG C AGG AAC AT ATG 
CTGAAGGATTACCTGGTGGTGGCCCAGGAGGCTCTGAGTGCACAAAAGGAGATCTACCAGGTG 
AAGCAGCAGCGCCTGGAGCTTGCACAGCAGGAGTACCAGCAACTGCATGCCGTCTGGGAGCAT 
AAGCTGGGCTCCCAGGTCAGCGTGGTCTCTGGTTCATCATCCAGCTCCAAGTATGACCCTGAG 
ATCCTGAAAGCTGAAATTGCCACTGCAAAATCCCGGGTAAACAAGCTGAAGAGAGAGATGGTT 
CACCTCCAGCACGAGCTGCAGTTCAAAGAGCGTGGCTTTCAGACCCTGAAGAAGATCGATAAG 
AAAATGTCTGATGCTCAGGGCAGCTACAAACTGGATGAAGCTCAGGCTGTCTTGAGAGAAACA 
AAAGCCATCAAAAAGGCTATTACCTGTGGGGAAAAGGAAAAGCAAGATCTCATTAAGAGCCTT 
GCCATGTTGAAGGACGGCTTCCGCACTGACAGGGGGTCTCACTCAGACCTGTGGTCCAGCAGC 
AGCTCTCTGGAGAGTTCGAGTTTCCCGCTACCGAAACAGTACCTGGATGTGAGCTCCCAGACA 
GACATCTCGGGAAGCTTCGGCATCAACAGCAACAATCAGTTGGCAGAGAAGGTCAGATTGCGC 
CTTCGATATGAAGAGGCTAAGAGAAGGATCGCCAACCTGAAGATCCAGCTGGCCAAGCTTGAC 
AGTGAGGCCTGGCCTGGGGTGCTGGACTCAGAGAGGGACCGGCTGATCCTTATCAACGAGAAG 
GAGGAGCTGCTGAAGGAGATGCGCTTCATCAGCCCCCGCAAGTGGACCGACCCCCTCCTGGCT 
GGTGATGCCTTCCTCAACTCCTTGGAGTTTGAAGACCCGGAGCTGAGTGCCACTCTTTGTGAA 
CTGAGCCTTGGTAACAGCGCCCAGGAAAGATACCGGCTGGAGGAACCAGGAACGGAGGGCAAG 
CAGCTGGGCCAAGCTGTGAATACGGCCCAGGGGTGTGGCCTGAAAGTGGCCTGTGTCTCAGCC 
G CCGT AT CGG ACG AGTC AGTG G CTGG AG AC AG TGGTG TGT ACG AGGC TTC CGTGC AG AG AC TG 
GGTG CTT C AG AAG CTGCTGCATT TG AC AGTG ACG AAT CGG AAGC AGTGGGTG CG ACCCG AATT 
CAG AT TG CCC TG AAG TATG ATG AG AAG AAT AAGC AATTTGC AAT ATT AAT C AT CC AGC TG AGT 
AACCTTT CTG CTCTGTTG CAG C AAC AAG ACC AG AAAGTG AAT ATC CGCGTGG CTG TCC T TC CT 
TGCTCTGAAAGCACAACCTGCCTGTTCCGGACCCGGCCTCTGGACGCCTCAGACACTCTAGTG 
TTCAATGAGGTGTTCTGGGTATCCATGTCCTATCCAGCCCTTCACCAGAAGACCTTAAGAGTC 
GATGTCTGTACCACCGACAGGAGCCATCTGGAAGAGTGCCTGGGAGGCGCCCAGATCAGCCTG 
GCGGAGGTCTGCCGGTCTGGGGAGAGGTCGACTCGCTGGTACAACCTTCTCAGCTACAAATAC 
TTGAAGAAGCAGAGCAGGGAGCTCAAGCCAGTGGGAGTTATGGCCCCTGCCTCAGGGCCTGCC 
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i 
J 
i 

i 

i 

! 


AGCACGGACGCTGTGTCTGCTCTGTTGGAACAGACAGCAGTGGAGCTGGAGAAGAGGCAGGAG 
GGCAGGAGCAGCACACAGACACTGGAAGACAGCTGGAGGTATGAGGAGACCAGTGAGAATGAG 
GCAGTAGCCGAGGAAGAGGAGGAGGAGGTGGAGGAGGAGGAGGGAGAAGAGGATGTTTTCACC 
GAGAAAGCCTCACCTGATATGGATGGGTACCCAGCATTAAAGGTGGACAAAGAGACCAACACG 
GAGACCCCGGCCCCATCCCCCACAGTGGTGCGACCTAAGGACCGGAGAGTGGGCACCCCGTCC 
CAGGGGCCATTTCTTCGAGGGAGCACCATCATCCGCTCTAAGACCTTCTCCCCAGGACCCCAG 
AG CC AGT ACGTGTGC CGG CTG AATCGG AGTG AT AGTG AC AGCTCC ACTCTG T C C AAAAAG CCA 
CCTTTTGTTCG AAACTC C CTGGAGCG ACG C AG CGTCCGG ATG AAG CGGCCTTC CTCGGT C AAG 
T CGCTG CG CTCCG AG CGT CTG AT CCG T AC C TCG CTGG AC CTGG AG TT AG ACCTG C AGG CG AC A 
AGAACCTGGCACAGCCAACTGACCCAGGAGATCTCGGTGCTGAAGGAGCTCAAGGAGCAGCTG 
GAACAAGCCAAGAGCCACGGGGAGAAGGAGCTGCCACAGTGGTTGCGTGAGGACGAGCGTTTC 
CGCCTGCTGCTGAGGATGCTGGAGAAGCGGATGGACCGAGCGGAGCACAAGGGTGAGCTTCAG 
ACAGACAAGATGATGAGGGCAGCTGCCAAGGATGTGCACAGGCTCCGAGGCCAGAGCTGTAAG 
GAACCCCCAGAAGTTCAGTCTTTCAGGGAGAAGATGGCATTTTTCACCCGGCCTCGGATGAAT 
ATCCCAGCTCTCTCTGCAGATGACGTCTAATCGCCAGAAAAGTATTTCCTTTGTTCCACTGAC 


CAGGCTGTGAACATTGACTGTGGCTAAAGTTATTTATGTGGTGTTATATGAAGGTACTGAGTC 


ACAAGTCCTCTAGTGCTCTTGTTGGTTTGAAGATGAACCGACTTTTTAGTTTGGGTCCTACTG 


TTGTTATTAAAAACAGAACAAAAACAAAACACACACAC 




ORF Start: ATG at 94 jORF Stop: TAA at 2863 




SEQIDNO:258 ;923 aa MWat 104821.7kD 


jNOV72b, 

jCG 139224-02 Protein 
jSequence 

1 


MPRPELPLPEGWEEARDFIX3KVYYIDHTNRTTSWIDPRDRYTKPLTFADCISDELPLGWEEAY 
DPQVGDYFIDHNTKTTQIEDPRVQWRREQEHMLKDYLWAQEALSAQKEIYQVKQQRLELAQQ 
EYQQLHAVWEHKLGSQVSWSGSSSSSKYDPEILKAEIATAKSRVNKLKREMVHLQHELQFKE 
RGFQTLKKIDKKMSDAQGSYKLDEAQAVLRETKAIKKAITCGEKEKQDLIKSLAMLKDGFRTD 
RGSHSDLWSSSSSLESSSFPLPKQYLDVSSQTDISGSFGINSNNQLAEKVRLRLRYEEAKRRI 
ANLKIQLAKLDSEAWPGVLDSERDRLILINEKEELLKEMRFISPRKWTDPLLAGDAFLNSLEF 
EDPELSATLCELSLGNSAQERYRLEEPGTEGKQLGOAVNTAQGCGLKVACVSAAVSDESVAGD 
SGVYEASVQRLGASEAAAFDSDESEAVGATRIQI ALKYDEKNKQFAILI IQLSNLSALLQQQD 
QKVNIRVAVLPCSESTTCLFRTRPLDASDTLVFNEVFWVSMSYPALHQKTLRVDVCTTDRSHL 
EECLGGAQISLAEVCRSGERSTRWYNLLSYKYLKKQSRELKPVGVMAPASGPASTDAVSALLE 
QTAVELEKRQEGRSSTQTLEDSWRYEETSENEAVAEEEEEEVEEEEGEEDVFTEKASPDMDGY 
PALKVDKETNTETPAPSPTWRPKDRRVGTPSQGPFLRGSTIIRSKTFSPGPQSQYVCRLNRS 
DSDSSTLSKKPPFVRNSLERRSVRMKRPSSVKSLRSERLIRTSLDLELDLQATRTWHSQLTQE 
ISVLKELKEQLEQAKSHGEKELPQWLREDERFRLLLRMLEKRMDRAEHKGELQTDKMMRAAAK 
DVHRLRGQSCKEPPEVQSFREKMAFFTRPRMNIPALSADDV 


! ]SEQIDNO:259 ]3698 bp 


!NOV72c. 

jCG 1 39224-03 DNA 

;Sequence 

i 

1 
i 

i 

! 

! 

! 


GCGGCCGCCCGGGCTAAGAGCGGCCGGCTGGAGCCGCTGAGCCCCCGCTGCGGCCGGGAGCTG 


CATGGGGGAGCGCCGGCAGCGCTTGGGAAGATGCCCCGGCCGGAGCTGCCCCTGCCGGAGGGC 


TGGGAGGAGGCGCGCGACTTCGACGGCAAGGTCTACTACATAGACCACACGAACCGCACCACC 
AGCTGGATCGACCCGCGGGACAGGTACACCAAACCGCTCACCTTTGCTGACTGCATTAGTGAT 
GAGTTGCCGCTAGGATGGGAAGAGGCATATGACCCACAGGTTGGAGATTACTTCATAGACCAC 
AACACCAAAACCACTCAGATTGAGGATCCTCGAGTACAATGGCGGCGGGAGCAGGAACATATG 
CTGAAGGATTACCTGGTGGTGGCCCAGGAGGCTCTGAGTGCACAAAAGGAGATCTACCAGGTG 
AAGCAGCAGCGCCTGGAGCTTGCACAGCAGGAGTACCAGCAACTGCATGCCGTCTGGGAGCAT 
AAGCTGGGCTCCCAGGTCAGCGTGGTCTCTGGTTCATCATCCAGCTCCAAGTATGACCCTGAG 
ATCCTGAAAGCTGAAATTGCCACTGCAAAATCCCGGGTAAACAAGCTGAAGAGAGAGATGGTT 
CACCTCCAGCACGAGCTGCAGTTCAAAGAGCGTGGCTTTCAGACCCTGAAGAAGATCGATAAG 
AAAATGTCTGATGCTCAGGGCAGCTACAAACTGGATGAAGCTCAGGCTGTCTTGAGAGAAACA 
AAAGCCATCAAAAAGGCTATTACCTGTGGGGAAAAGGAAAAGCAAGATCTCATTAAGAGCCTT 
GCCATGTTGAAGGACGGCTTCCGCACTGACAGGGGGTCTCACTCAGACCTGTGGTCCAGCAGC 
AGCTCTCTGGAGAGTTCGAGTTTCCCGCTACCGAAACAGTACCTGGATGTGAGCTCCCAGACA 
GACATCTCGGGAAGCTTCGGCATCAACAGCAACAATCAGTTGGCAGAGAAGGTCAGATTGCGC 
CTTCGATATGAAGAGGCTAAGAGAAGGATCGCCAACCTGAAGATCCAGCTGGCCAAGCTTGAC 
AGTGAGGCCTGGCCTGGGGTGCTGGACTCAGAGAGGGACCGGCTGATCCTTATCAACGAGAAG 
GAGGAGCTGCTGAAGGAGATGCGCTTCATCAGCCCCCGCAAGTGGACCCAGGGGGAGGTGGAG 
CAGCTGGAGATGGCCCGGAAGCGGCTGGAAAAGGACCTGCAGGCAGCCCGGGACACCCAGAGC 
AAGGCGCTGACGGAGAGGTTAAAGTTAAACAGTAAGAGGAACCAGCTTGTGAGAGAACTGGAG 
GAAGCCACCCGGCAGGTGGCAACTCTGCACTCCCAGCTGAAAAGTCTCTCAAGCAGCATGCAG 
TCCCTGTCCTCAGGCAGCAGCCCCGGATCCCTCACGTCCAGCCGGGGCTCCCTGGTTGCATCC 
AG CCTGG ACT CCT CC ACTTC AGCC AGCTTCACTG ACC TCT ACT ATG ACCCCTTTG AG C AG C TG 
GACTCAGAGCTGCAGAGCAAGGTGGAGTTCCTGCTCCTGGAGGGGGCCACCGGCTTCCGGCCC 
TCAGGCTGCATCACCACCATCCACGAGGATGAGGTGGCCAAGACCCAGAAGGCAGAGGGAGGT 
GGCCGCCTGCAGGCTCTGCGTTCCCTGTCTGGCACCCCAAAGTCCATGACCTCCCTATCCCCA 
CGTTCCTCTCTCTCCTCCCCCTCCCCACCCTGTTCCCCTCTCATGGCTGACCCCCTCCTGGCT 
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s 

1 

1 

1 

1 


GGTGATGCCTTCCTCAACTCCTTGGAGTTTGAAGACCCGGAGCTGAGTGCCACTCTTTGTGAA 
CTGAGCCTTGGTAACAGCGCCCAGGAAAGATACCGGCTGGAGGAACCAGGAACGGAGGGCAAG 
CAGCTGGGCCAAGCTGTGAATACGGCCCAGGGGTGTGGCCTGAAAGTGGCCTGTGTCTCAGCG 
GCCGTATCGGACGAGTCAGTGGCTGGAGACAGTGGTGTGTACGAGGCTTCCGTGCAGAGACTG 
GGTGCTTCAGAAGCTGCTGCATTTGACAGTGACGAATCGGAAGCAGTGGGTGCGACCCGAATT 
CAGATTGCCCTGAAGTATGATGAGAAGAATAAGCAATTTGCAATATTAATCATCCAGCTGAGT 
AACCTTTCTGCTCTGTTGCAGCAACAAGACCAGAAAGTGAATATCCGCGTGGCTGTCCTTCCT 
TGCTCTGAAAGCACAACCTGCCTGTTCCGGACCCGGCCTCTGGACGCCTCAGACACTCTAGTG 
TTCAATGAGGTGTTCTGGGTATCCATGTCCTATCCAGCCCTTCACCAGAAGACCTTAAGAGTC 
GATGTCTGTACCACCGACAGGAGCCATCTGGAAGAGTGCCTGGGAGGCGCCCAGATCAGCCTG 
GCGG AGG TCTGCC GGTCTGGG G AG AGGTCG ACTCGCT GGT AC AACCTTCTC AG CT AC AAAT AC 
TTGAAGAAACAGAGCAGGGAGCTCAAGCCAGTGGGAGTCATGGCCCCTGCCTCAGGGCCTGCC 
CGGATGAAGCGGCCTTCCTCGGTTAAGTCGCTGCGCTCCGAGCGTCTGATCCGTACCTCGCTG 
GACCTGGAGTTAGACCTGCAGGCGACAAGAACCTGGCACAGCCAATTGACCCAGGAGATCTCG 
GTGCTGAAGGAGCTCAAGGAGCAGCTGGAACAAGCCAAGAGCCACGGGGAGAAGGAGCTGCCA 
CAGTGGTTGCGTGAGGACGAGCGTTTCCGCCTGCTGCTGAGGATGCTGGAGAAGCGGCAGATG 
GACCGAGCGGAGCACAAGGGTGAGCTTCAGACAGACAAGATGATGAGGGCAGCTGCCAAGGAT 
GTGCACCAGCTCCGAGGCCAGAGCTGTAAGGAACCCCCAGAAGTTCAGTCTTTCAGGGAGAAG 
ATGGCATTTTTCACCLUCjCL A CIj<jA1 bAATATCCC AuC I 0. it I L J. CaC ACjATGACGl CTAATCG 
CCAGAAAAGTATTTCCTTTGTTCCACTGACCAGGCTGTGAACATTGACTGTGGCTAAAGTTAT 


TTATGTGGTGTTATATGAAGGTACTGAGTCACAAGTCCTCTAGTGCTCTTGTTGGTTTGAAGA 




CACACACACAAAAACAGAAACAAAAAAAACCAGCATTAAAATAATAAGATTGTATAGTTTGTA 


TATTTAGGAGTGTATTTTTGGGAAAGAAAATTTAAATGAACTAAAGCAGTATTGAGTTGCTGC 


TCTTCTTAAAATCGTTTAGATTTTTTTTGGTTTGTACAGCTCCACCTTTTAGAGGTCTTACTG 


CAATAAGAAGTAATGCCTGGGGGACGGTAATCCTAATAGGACGTCCCGCACTTGTCACAGTAC 


AGCTAATTTTTCCTAGTTAACATATTTTGTACAATATTAAAAAAATGCACAGAAACCATTGGG 


GGGGATTCAGAGGTGCATCCACGGATCTTCTTGAGCTGTGACGTGTTTTTATGTGGCTGCCCA 


ACGTGGAGCGGGCAGTGTGATAGGCTGGGTGGGCTAAGCAGCCTAGTCTATGTGGGTGACAGG 


CCACGCTGGTCTCAGATGCCCAGTGAAGCCACTAACATGAGTGAGGGGAGGGCTGTGGGGAAC 


TCCATTCAGTTTTATCTCCATCAATAAAGTGGCCTTTCaAAAAG 




ORF Start: ATG at 94 j lORF Stop: TA A at 2956 




SEQ ID NO: 260 (954 aa ~JmW at 107489.2kD 


NOV72c, 

CGI 39224-03 Protein 
Sequence 


MPRPELPLPEGWEEARDFDGKVYYIDHTNRTTSWIDPRDRYTKPLTFADCISDELPLGWEEAY 
DPQVGDYFIDHNTKTTQIEDPRVQWRREQEHMLKDYLWAQEALSAQKEIYQVKQQRLELAQQ 

EYQQLHAVWEHKLGSQVSWSGSSSSSKYDPEILKAEIATAKSRVNKLKREMVHLQHELQFKE 

Dpc/^TT wTnvtrMcnRAPCvvT nraA&UT orTVRTVin\TTrppvrvnr\T titct amt vnrroTn 
KL»r y 1 L»KKlUi\iU v loL;Ayvjo I K..LUE<AUAVljKfc. i 1 LIj&J\£i\AJL'Ij1 JVbijArlLjK.ULj f K I U 

RGSHSDLWSSSSSLESSSFPLPKQYLDVSSQTDISGSFGINSNNQLAEKVRLRLRYEEAKRRI 
ANLKIQLAKLDSEAWPGVLDSERDRLILINEKEELLKEMRFISPRKWTQGEVEQLEMARKRLE 
KDLQAARDTQSKALTERLKLNSKRNQLVRELEEATRQVATLHSQLKSLSSSMQSLSSGSSPGS 
LTSSRGSLVASSLDSSTSASFTDLYYDPFEQLDSELQSKVEFLLLEGATGFRPSGCITTIHED 
EVAKTQKAEGGGRLQALRSLSGTPKSMTSLSPRSSLSSPSPPCSPLMADPLLAGDAFLNSLEF 
EDPELSATLCELSLGNSAQERYRLEEPGTEGKQLGQAVNTAQGCGLKVACVSAAVSDESVAGD 
SGVYEASVQRLGASEAAAFDSDESEAVGATRIQIALKYDEKNKQFAILIIQLSNLSALLQQQD 
QKVNIRVAVLPCSESTTCLFRTRPLDASDTLVFNEVFWVSMSYPALHQKTLRVDVCTTDRSHL 
EECLGGAQISLAEVCRSGERSTRWYNLLSYKYLKKQSRELKPVGVMAPASGPARMKRPSSVKS 
LRSERLIRTSLDLELDLQATRTWHSQLTQEISVLKELKEQLEQAKSHGEKELPQWLREDERFR 
LLLRMLEKRQMDRAEHKGELQTDKMMRAAAKDVHQLRGQSCKEPPEVQSFREKMAFFTRPRMN 
IPALSADDV 


Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 72B. 



Table 72 B. Comparison of NOV72a against NOV72b and NOV72c. 


Protein Sequence 


NOV72a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV72b 


553.. 1 1 
364..923 


483/560 (86%) 
483/560 (86%) 
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; NOV72c 


| 1..870 


678/871 


(77%) 


i 


j 1..871 


691/87! 


(78%) 



Further analysis of the NOV72a protein yielded the following properties shown in 
Table 72C. 



Table 72C Protein Sequence Properties NOV72a 


PSort 
analysis: 


0.7600 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
! analysis; 


No Known Signal Sequence Predicted 



5 



A search of the NOV72a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 72D. ° 



Tabic 72D. Geneseq Results for NOV72a 


Geneseq 
Identifier 

! 


Protcin/Organism/Length [Patent #, 
Date] 


NOV72a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


IAAU74354 

i 
I 


Human cytoskeleton-associated protein 
(CYSKP) #25 - Homo sapiens, 912 aa. 
[WO200185942-A2, 15-NOV-2001] 


202..1112 
L.9I2 


910/912(99%) 
910/912(99%) 


0.0 


ABBI1742 

1 

! 


Human KIAA0869 protein homoiogue, 
SEQ IDNO:21 12 - Homo sapiens, 894 
aa. [WO200157188-A2, 09-AUG- 
2001] 


225..1 112 
I..894 


887/894 (99%) 
887/894 (99%) 


0.0 


|AAB93267 


Human protein sequence SEQ ID 
NO: 12300 - Homo sapiens, 379 aa. 
[ EP 1 0746 1 7-A2, 07-FEB-200 1 ] 


734.. 11 12 
I..379 


379/379(100%) 
379/379(100%) 


0.0 


AAB42I94 

! 
I 
i 

i 


Human ORFX ORF1958 polypeptide 
sequence SEQ IDNO:3916 - Homo 
sapiens, 342 aa. [WO200058473-A2, 
05-OCT-2000] 


777.. 1 1 12 
1..342 


335/343 (97%) 
335/343 (97%) 


0.0 


! AAB43089 


Human ORFX ORF2853 polypeptide 
sequence SEQ ID NO:5706 - Homo 
sapiens, 202 aa. [WO200058473-A2, 
05-OCT-2000] 


I..I70 
23.. 192 


167/170 (98%) 
168/170 (98%) 


7e-96 
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In a BLAST search of public sequence datbases, the NOV72a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 72E. 



Table 72E. Public BLASTP Results for NOV72a 


Protein 

Accession 

Number 


Protein/Organism/Lcngth 


NOV72a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


094946 


KIAA0869 protein - Homo sapiens 
(Human), 888 aa (fragment). 


225.. 1112 
1..888 


888/888(100%) 
888/888(100%) 


0.0 


Q922W3 


Unknown (Protein for 
IMAGE:3963643) - Mus musculus 
(Mouse), 967 aa (fragment). 


145..1112 
8..967 


878/970 (90%) 
909/970 (93%) 


0.0 


Q8VD17 


Hypothetical 90.4 kDa protein - Mus 
musculus (Mouse), 8 12 aa 
(fragment). 


293..1I12 
1..812 


738/822 (89%) 
764/822 (92%) 


0.0 


Q9BT29 


Hypothetical 38.0 kDa protein - 
Homo sapiens (Human), 332 aa 
(fragment). 


782..1II2 
I..332 


331/332 (99%) 
331/332 (99%) 


0.0 


Q8WVM4 


Hypothetical 32.9 kDa protein - 
Homo sapiens (Human), 285 aa 
(fragment). 


829..1112 
1..285 


284/285 (99%) 
284/285 (99%) 


e-159 



5 PFam analysis predicts that the NOV72a protein contains the domains shown in the 

Table 72F. 



Table 72F. Domain Analysis of NOV72a 


Pfam Domain 


NOV72a Match Region 


Identities/ 
Similarities 

Tor the Matched Region 


Expect Value 


WW 


8..37 


15/30 (50%) 
27/30 (90%) 


l.7e-l 1 


WW 


55..84 


17/30 (57%) 
25/30 (83%) 


3.9e-07 



Example 73. 

1 0 The NOV73 clone was analyzed, and the nucleotide and encoded polypeptide 

sequences are shown in Table 73A. 
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Table 73A. NOV73 Sequence Analysis 


f 
i 


SEQ1DN0:261 i 1455 bp | 


;NOV73a, 

ICG 140088-01 DNA 
Sequence 

i 

i 

\ 
i 
\ 
j 


TCCTTGGCAGCCTTGGCAGCCACCTTGACACTCTCCTGTCTCCCCACCTCCACAGAGACAATG 


ACCATGTTTGAAAATGTCACCCGGGCCCTGGCCAGACAGCTAAACCCTCGAGGGGACCTGACA 
CCACTTGACAGCCTCATCGACTTCAAGCGCTTCCATCCCTTCTGCCTGGTGCTGAGGAAGAGG 
AAGAGCACGCTCTTCTGGGGGGCCCGGTACGTCCGCACCGACTACACGCTGCTGGATGTGCTT 
G AGC C CGGC AGCT C AC C TTC AG ACC C AAC AG AC ACTG GG AATTTTGGCTTT AAG AAT ATG C TG 
GACACCCGAGTGGAGGGAGATGTGGATGTACCAAAGACGGTGAAGGTGAAGGGAACGGCAGGG 
CTCTCGCAGAACAGCACTCTGGAGGTCCAGACACTCAGTGTGGCTCCCAAGGCCCTGGAGACC 
GTGCAGGAGAGGAAGCTGGCAGCAGACCACCCATTCCTGAAGGAGATGCAAGATCAAGGGGAG 
AACCTGTATGTGGTGATGGAGGTGGTGGAGACGGTGCAGGAGGTCACACTGGAGCGAGCCGGC 
AAGGCAGAGGCCTGCTTCTCCCTCCCCTTCTTCGCCCCATTGGGGCTACAGGGATCCATAAAT 
CACAAGGAGGCTGTAACCATCCCCAAGGGCTGCGTCCTGGCCTTTCGAGTGAGACAGCTGATG 
GTCAAAGGCAAAGATGAGTGGGATATTCCACATATCTGCAATGATAACATGCAAACCTTCCCT 
CCTGGAGAGGAAGGTGCCCGATGGAGGGCTGTGTGTCCCATTATTGTCCCCATAGGGGACGTA 
CACGAAGGCTTCAGGACACTAAAAGAAGAAGTTCAGAGAGAGACCCAACAAGTGGAGAAGCTG 
AGCCG AGT AGGGC AAAG C TCC CTGCT C AG CT CCC T C AG C AAACTT CT AGGG AAG AAAAAGG AG 
CT AC AAG ACC TTG AG CTCGC ACT TG AAGG GG CT C T AG AC AAGGG AC ATG AAGTG A CC C TGG AG 
GCACTCCCAAAAGATGTCCTGCTATCAAAGGAGGCCGTGGGCGCCATCCTCTATTTCGTTGGA 
GCCCTAACAGAGCTAAGTGAAGCCCAACAGAAGCTGCTGGTGAAATCCATGGAGAAAAAGATC 
CTACCCGTGC AGCT AAAG CTG G TGG AG AG C ACG ATGG AAC AG AACTTCCTGCTGG AT AAAG AG 
GGTGTTTTCCCCCTGCAACCTGAGCTGCTCTCCTCCCTTGGGGACGAGGAGCTGACCCTCACG 
GAGGCTCTAGTCGGGCTGAGTGGCCTGGAAGTGCAGAGATCGGGCCCCCAATATATGTGGGAC 
CCAGACACCCTCCCTCGCCTCTGTGCTCTTTATGCAGGCCTCTCTCTCCTTCAGCAGCTTACC 
AAGGCCTCCTAATTTGCCTTTTACGTCTGCTTCATGACTCCCTAATGCCTTCCCAACCTCGTG 


GTGCTG 




ORF Start: ATG at61 j jORF Stop: TAA at 1396 




SEQIDNO: 262 


445 aa jMW at 49428.5kD 


NOV73a, 

CGI 40088-01 Protein 
Sequence 


MTMFENVTRALARQLNPRGDLTPLDSLIDFKRFHPFCLVLRKRKSTLFWGARYVRTDYTLLDV 
LEPGSSPSDPTDTGNFGFKNNLDTRVEGDVDVPKTVKVKGTAGLSQNSTLEVQTLSVAPKALE 
TVQERKLAADHPFLKEMQDQGENLYWMEWETVQEVTLERAGKAEACFSLPFFAPLGLQGSI 
NHKEAVTIPKGCVLAFRVRQLMVKGKDEWDIPHICNDNMQTFPPGEEGARWRAVCPIIVPIGD 
VHEGFRTLKEEVQRETQQVEKLSRVGQSSLLSSLSKLLGKKKELQDLELALEGALDKGHEVTL 
EALPKDVLLSKEAVGAILYFVGALTELSEAQQKLLVKSMEKKILPVQLKLVESTMEQNFLLDK 
EGVFPLQPELLSSLGDEELTLTEALVGLSGLEVQRSGPQYMWDPDTLPRLCALYAGLSLLQQL 
TKAS 




SEQIDNO: 263 


1386 bp 


!NOV73b, 

CGI 40088-02 DNA 
Sequence 

i 

i 

| 

! 

i 

| 


CTCCACAGAGACAATGACCATGTTTGAAAATGTCACCCGGGCCCTGGCCAGACAGCTAAACCC 
TCGAGGGGACCTGACACCACTTGACAGCCTCATCGACTTCAAGCGCTTCCATCCCTTCTGCCT 
GGTGCTGAGGAAGAGGAAGAGCACGCTCTTCTGGGGGGCCCGGTACGTCCGCACCGACTACAC 
GCTGCTGGATGTGCTTGAGCCCGGCAGCTCACCTTCAGACCCAACAGACACTGGGAATTTTGG 
CTTTAAGAATATGCTGGACACCCGAGTGGAGGGAGATGTGGATGTACCAAAGACGGTGAAGGT 
GAAGGGAACGGCAGGGCTCTCGCAGAACAGCACTCTGGAGGTCCAGACACTCAGTGTGGCTCC 
CAAGGCCCTGGAGACCTTGCAGAAGAGGAAGCTGGCAGCAGACCACCCATTCCTGAAGGAGAT 
GCAAGATCAAGGGGAGAACCTGTATGTGGTGATGGAGGTGGTGGAGACGGTGCAGGAGGTCAC 
ACTGGAGCGAGCCGGCAAGGCAGAGGCCTGCTTCTCCCTCCCCTTCTTCGCCCCATTGGGGCT 
ACAGGGATCCATAAATCACAAGGAGGCTGTAACCATCCCCAAGGGCTGCGTCCTGGCCTTTCG 
AGTGAGACAGCTGATGGTCAAAGGCAAAGATGAGTGGGATATTCCACATATCTGCAATGATAA 
CATGCAAACCTTCCCTCCTGGAGAAAAGTCAGGAGAGGAGAAGGTCATCCTTATCCAGGCATC 
TG ATGT TGGGG ACGTAC ACG AAGG CTTC AG G AC AC T AAAAG AAG AAGTTC AG AG AG AG AC C C A 
ACAAGTGGAGAAGCTGAGCCGAGTAGGGCAAAGCTCCCTGCTCAGCTCCCTCAGCAAACTTCT 
AG GG AAG AAAAAGG AGCT AC AAG AC CTTG AG CTCG C ACTTG AAGGGG CT CT AG AC AAGG G AC A 
TGAAGTGACCCTGGAGGCACTCCCAAAAGATGTCCTGCTATCAAAGGAGGCCGTGGGCGCCAT 
CCTCTATTTCGTTGGAGCCCTAACAGAGCTAAGTGAAGCCCAACAGAAGCTGCTGGTGAAATC 
CATGGAGAAAAAGATCCTACCCGTGCAGCTAAAGCTGGTGGAGAGCACGATGGAACAGAACTT 
CCTGCTGGATAAAGAGGGTGTTTTCCCCCTGCAACCTGAGCTGCTCTCCTCCCTTGGGGACGA 
GGAGCTGACCCTCACGGAGGCTCTAGTCGGGCTGAGTGGCCTGGAAGTGCAGAGATCGGGCCC 
CCAATATATGTGGGACCCAGACACCCTCCCTCGCCTCTGTGCTCTTTATGCAGGCCTCTCTCT 
C CTTC AG CAGC TT ACCAAGG CCT CCT AATT TG CCTTTT ACG TC TGCTTC ATG ACTC C CT AATG 




ORF Start: ATG at 14 


]ORF Stop: TAA at 1349 




SEQIDNO: 264 


445 aa |MW at 49377.4kD 
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!NOV73b, 

jCG 140088-02 Protein 
jSequence 



MTMFENVTRALARQLNPRGDLTPLDSLIDFKRFHPFCLVLRKRKSTLFWGARYVRTDYTLLDV 
LEPGSSPSDPTDTGNFGFKNMLDTRVEGDVDVPKTVKVKGTAGLSQNSTLEVQTLSVAPKALE 
TLQKRKLAADHPFLKEMQDQGENLYVVMEWETVQEVTLERAGKAEACFSLPFFAPLGLQGSI 
NHKEAVTIPKGCVLAFRVRQLMVKGKDEWDIPHICNDNMQTFPPGEKSGEEKVILIQASDVGD 
VHEGFRTLKEEVQRETQQVEKLSRVGQSSLLSSLSKLLGKKKELQDLELALEGALDKGHEVTL 
EALPKDVLLSKEAVGAILYFVGALTELSEAQQKLLVKSMEKKILPVQLKLVESTMEQNFLLDK 
EGVFPLQPELLSSLGDEELTLTEALVGLSGLEVQRSGPQYMWDPDTLPRLCALYAGLSLLQQL 
TKAS 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 73B. 



Table 73B. Comparison of NOV73a against NOV73b. 



Protein Sequence 

i 


NOV73a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV73b 


1..445 ■ 
1..445 


398/445 (89%) 
404/445 (90%) 



Further analysis of the NOV73a protein yielded the following properties shown in 
Table 73C. 



r~ ..... 

j Table 73C. Protein Sequence Properties NOV73a 


jPSort 

j analysis: 

\ 
i 

j „ .. — 


0.3600 probability located in mitochondrial matrix space; 0.3000 probability located 
in microbody (peroxisome); 0.3000 probability located in nucleus; 0.1000 
probability located in lysosome (lumen) 


I SignalP 
| analysis: 


No Known Signal Sequence Predicted 



10 A search of the NOV73a protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 73 D. 



I Tabic 73D. Geneseq Results for NOV73a 



! 1 

; 

Geneseq 
i Identifier 

i 


r 1 — ■■ — - - - — 

Protein/Organism/Length [Patent #, 
Date) 


NOV73a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


! AAB93904 

i 


Human protein sequence SEQ ID 
NO: 1 3862 - Homo sapiens, 484 aa. 
[EPI0746I7-A2, 07-FEB-200I] 


4..443 
5..481 


157/480 (32%) 
242/480 (49%) 


3e-50 


ABB90142 i Human polypeptide SEQ1DN0 2518- 


4..433 
5..469 


149/470 (31%) 
233/470 (48%) 


6e-45 
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i 

! 


[WO200I90304-A2. 29-NOV-2001] 






, 


AAB66866 


Human peptidyl-prolyl isomerase-2 - 
Homo sapiens, 443 aa. [US6I7I843- 
BI.09-JAN-2001] 


4.373 
5..410 


128/410(31%) 
202/410(49%) 


9e-34 


ABB97522 

i 


Novel human protein SEQ ID NO: 790 

- Homo sapiens, 403 aa. 

[ WO200222660-A2. 2 1 -MAR-2002] 


2..443 
3.396 


134/462 (29%) 
219/462 (47%) 


le-30 


ABB72295 


Murine protein isolated from skin cells 
SEQ ID NO: 507 - Mus sp, 244 aa. 
[WO200 1 90357-A 1 , 29-NO V-200 1 ] 


247..445 
23..232 


82/212 (38%) 
118/212(54%) 


3e-23 



In a BLAST search of public sequence datbases, the NOV73a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 73E, 



I Table 73E. Public BLASTP Results for NOV73a 



Protein 

Accession 

Number 


Protein/Organism/Length 


NOV73a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


BAC04790 


CDNA FLJ39120 fis, clone 
NTONG2006646, highly similar to 
Mus musculus Gd gasdermin - Homo 
sapiens (Human), 445 aa. 


1..445 
1..445 


428/445 (96%) 
433/445 (97%) 


0.0 

j 


Q96QA5 


Gastric cancer-related protein FKSG9 - 
Homo sapiens (Human), 446 aa. 


1 ..445 
1 ..446 


421/446(94%) 
428/446 (95%) 


0.0 


Q9ESTI 


Gasdermin - Mus musculus (Mouse), 
446 aa. 


I..445 
1..446 


382/446 (85%) 
413/446 (91%) 


0.0 

1 

! 


Q9D810 


220000 1 G2 1 Rik protein - Mus 
musculus (Mouse), 276 aa. 


186..445 
I9..276 


191/260 (73%) 
220/260 (84%) 


e-101 


Q9D8T2 


1810036L03Rik protein - Mus 
musculus (Mouse), 487 aa. 


4. .441 
5..482 


152/481 (31%) 
243/481 (49%) 


le-47 



5 



PFam analysis predicts that the NOV73a protein contains the domains shown in the 
Table 73F. 



Table 73 F. Domain Analysis of NOV73a 



Pfam Domain 



NOV73a Match Region 



Identities/ 
Similarities 

for the Matched Region 



Expect Value 
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Example 74. 

The NOV74 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 74A. 



jTable 74A. NOV74 Sequence Analysis 


j 


SEQ ID NO: 265 J 1 1 1 24 bp 


|NOV74a, 

jCC 140 170-01 DNA 
{Sequence 

t 

i 

t 

t 

i 

1 

i 

( 

t 
I 


GAAATTTTAATTTATTATTCCCCCCTTTTTTTCCTGCATCTATAGGATAATATTGTAAAATAG 


CAATTGAAACCAATAATCATTAAATAAAT AT CAAGG AAAATCCAAG C AAAG CTTT CTTTTTGT 


TGGACTAGTGGTGTGGTGTTTGGAGACAGTCTCTGAATGTGAACAGGAAAGCACCCATCAGCA 


AAACACGATCACTCTCTAGGGAGACAGCTGGGGGAATCTGACTCTGGCTTCTGCTTTTGTTTT 


AAGGGATTAACTTCCCTGTCAAGTCCAAGAAGACTTGCGTATGAGAAGATTACCTGATGGACT 


TAATT CTAAGATTAGCTTTTTTCATC AAG ATGGAAAAAG ATCTTTAGGAG CAGAAAAGGGG AG 


TGCTAACTGGGGGAGCGAGAAGGGAGACGAGCAAAAGAAACAAAATCTTGCCACGTGGCTCTG 


TTTTGTCAGCAAGAGGATTTAAGACTCACCCAGGGCAAACACTGGGACCACTGTAAGAGCGCT 


GGAACATTCTGCCTCTTGAGTGAAGGGGCCTTCTTTCTAGCCTCTATGGCACTGAGGGGTGCG 


CCGGCTGGTGGAGGAGTAGTCCGATGGAGCCCTGCGTTCCCCGGGGACACAGGGCCAAGCTTT 


GAGGTGGAAAGTTTCTGGTTCTGAAACAACAAGGAGAGAGTCTGTTTTTCTTCCTAAAATTTG 


G ACTCTTGTC TG CACAAACT CTGGTCTGTTT TGC ACGGTTTGTGTG C CTTTTTTT C CCTTT AT 


GCAATCTTTTTCAGCTTTAGCAGCAGAAATTTGTCTAGTTCAGGAAACATGCTAGAGGGTGGC 


TTCAGAAGGAAGATGATCCTGTGTATTCTGTCTCTGCATCCGAACTTTTGAAGAGAAAAATTC 


GAGCTAGAGGGATTCTTAAAGCCTTAAGTTACTTGAAATCTATGTATTTGCAACCCTTTGTCT 


CTGGAATCATATTACACTAAACTGGAATCTCAGGCTGAATGAGAATAACCAAGTGGAGTAAAA 


AGAAGAAAACCGTTTCTTGATCACCACTTAATTAACGATGCTCTTTCTCCAAAGGATCAGCAC 


GT TCTTCCTCTG AG AACT TG AAAAT AC AAATGG ACC C C ATGTTTTTT T AAG C ATT AC CTTT T C 


TTAGAAGACTGCCATCATCTTTTATAGAGGAATTTTTTCACTATGCATTCAGTGGATCTTTAT 


AAAATACTGACCTTCTAATTAGATTCAGGTCAGTCTTAATTAAAGGGGGAAAAAAGCAACGCA 


AGCCAACCACAAAAACACATATACCAATGAAAGAAATTGGTTTAAATTTCACAGCATTAACAT 


T ACTTTTT AAGT AAAAC AGTT C ATTG AAG AAAGT ATG T ATG C AG C AGTGG AA C ATGGG C CT GT 


{ 
j 

i 

i 
j 

s 

1 

1 

i 

i 

• 


GCTTTGCAGTGACTCCAACATCCTGTGCCTGTCCTGGAAGGGGCGTGTCCCCAAGAGTGAGAA 
GGAGAAGCCTGTGTGCAGGAGACGCTACTATGAGGAAGGCTGGCTGGCCACGGGCAACGGGCG 
AGGAGTGGTTGGGGTGACTTTCACCTCTAGTCACTGTCGCAGGGACAGGAGTACTCCACAGAG 
GATAAATTTCAACCTCCGGGGCCACAATAGCGAGGTTGTGCTGGTGAGGTGGAATGAGCCCTA 
CCAGAAACTGGCCACGTGCGATGCGGACGGAGGCATATTCGTGTGGATTCAGTACGAGGGCAG 
GTGGTCTGTGGAGCTGGTCAACGACCGCGGGGCGCAGGTGAGTGATTTCACGTGGAGCCATGA 
TGGAACTCAAGCACTTATTTCCTATCGAGATGGGTTTGTCCTGGTTGGGTCTGTCAGTGGACA 
AAGAC ACTGGTC ATCCG AAAT CAACTTGG AAAGT CAAAT T ACG TGTG G C AT ATGG ACT C CTG A 
CGACCAACAGGTGCTGTTTGGCACGGCCGATGGGCAGGTGATTGTCATGGATTGCCACGGCAG 
AATGCTGGCCCACGTCCTCTTGCACGAGTCAGACGGTGTCCTCGGCATGTCCTGGAACTACCC 
G ATCT TC CTGGTG G AGG ACAG C AGCG AG AGC G AC ACGG A CTC AG ATG ACT ACGC CCCTCCCCA 
AG ATGGT CCGG GAG C AT ATCC C AT CC C AGTG C AG AA C ATC AAGCCTC TG CTC AC CGT C AGCTT 
CACCTCGGGAGACATCAGCTTAATGAACAACTACGATGACTTGTCTCCCACGGTCATCCGCTC 
AGGGCTGAAAGAGGTGGTAGCCCAGTGGTGCACACAGGGGGACTTGCTGGCAGTCGCTGGGAT 
GG AACGG C AG ACCC AG CTTGGTG AG CTTCC CAATGGTCC CCTT CTG AAG AGTG C C ATG G T C AA 
GTTCTACAATGTTCGTGGGGAGCACATCTTCACACTGGACACTCTCGTGCAGCGCCCCATCAT 
CTCCATCTGCTGGGGTCACCGGGATTCGAGGCTGTTGATGGCATCAGGACCAGCCCTGTACGT 
GGTGCGTGTGGAGCACCGGGTGTCCAGCCTGCAGCTGCTGTGCCAGCAGGCCATCGCCAGCAC 
CTTGCGTGAGGACAAGGACGTCAGCAAGCTGACTCTGCCCCCCCGCCTCTGCTCCTACCTCTC 
CACTGCCTTCATCCCCACCATCAAGCCCCCAATTCCAGATCCGAACAACATGAGAGACTTTGT 
CAGCTACCCATCAGCCGGCAACGAGCGGCTGCACTGCACCATGAAGCGCACAGAGGACGACCC 
GGAGGTGGGCGGCCCGTGCTACACGCTCTACCTGGAGTACCTGGGCGGGCTTGTGCCCATCCT 
CAAAGGGCGGCGCATCAGCAAGCTGCGGCCAGAGTTCGTCATCATGGACCCGCGGACAGATAG 
CAAACCAGATGAAATCTATGGGAACAGCTTGATTTCTACTGTGATCGACAGCTGCAACTGCTC 
AGACTCCAGTGACATTGAGCTGAGTGATGACTGGGCTGCCAAGAAATCTCCCAAAATCTCCAG 
AGCTAGCAAATCACCCAAACTCCCAAGGATC AGC ATTG AGGC CCGCAAGTC ACCC AAG CTG CC 
CCGGGCTGCTCAGGAGCTCTCCCGGTCCCCACGGTTGCCCCTGCGCAAGCCCTCTGTGGGCTC 
GCCCAGCCTGACTCGGAGAGAGTTTCCTTTTGAAGACATCACTCACCCCACCTATCTTGCTCA 
GGTCACGTCTAATATCTGGGGAACCAAATTTAAGATTGTGGGCTTGGCTGCTTTCCTGCCAAC 
CAACCTCGGTGCAGTAATCT AT AAAAC CAGC CT C C TGC ATCT C C AGCCG CGG C AG ATG AC C AT 
TTATCTCCCAGAAGTTCGGAAAATTTCCATGGACTATATTAATTTACCTGTCTTCAACCCAAA 
TGTTTTCAGTGAAGATGAAGATGATTTACCAGTGACAGGAGCATCTGGTGTCCCTGAGAACAG 
CCCACCTTGTACCGTGAACATCCCTATTGCACCGATCCACAGCTCGGCTCAGGCTATGTCCCC 
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■ C ACGC AG AGC ATAGG G CTGG TG C AGT CCCT ACTG G C C AAT C AG AATGTG C AG CT AG ATGT C CT 

j GACCAACCAGACGACAGCTGTAGGGACAGCAGAACATGCAGGTGACAGTGCCACCCAGTACCC 
| AGTCTCCAACCGGTACTCCAATCCTGGACAGGTGATTTTCGGAAGCGTGGAAATGGGCCGCAT 
| CATTCAGAACCCCCCTCCACTGTCCCTGCCTCCCCCGCCGCAGGGGCCCATGCAGCTGTCCAC 
GGTGGGCCATGGAGACCGAGACCACGAACACCTGCAGAAGTCAGCCAAGGCCCTGCGGCCAAC 
j ACCGCAGCTGGCAGCTGAGGGGGACGCAGTGGTCTTTAGTGCCCCCCAGGAGGTCCAGGTGAC 
CAAGATAAACCCTCCACCCCCGTACCCAGGAACCATCCCCGCTGCCCCCACCACAGCAGCACC 
CCCGCCCCCTCTGCCGCCCCCACAGCCCCCAGTGGATGTGTGCTTGAAGAAGGGCGACTTCTC 
CCTCTACCCCACGTCAGTGCACTACCAGACCCCCCTGGGCTATGAGAGGATCACCACCTTCGA 
CAGCAGTGGCAACGTGGAGGAGGTGTGCCGGCCCCGCACCCGGATGCTGTGCTCCCAGAACAC 
| CTACACCCTCCCCGGCCCGGGTAGCTCTGCCACCTTGAGGCTCACGGCCACTGAGAAGAAGGT 
j CCCTCAGCCCTGCAGCAGTGCCACCCTGAACCGCCTGACCGTCCCTCGCTACTCCATCCCCAC 
; CGGGGACCCACCCCCGTATCCTGAAATTGCCAGCCAGCTGGCCCAGGGGCGGGGGGCTGCCCA 
j GAGGTCCGACAATAGCCTCATCCACGCTACCCTGCGGAGGAACAACCGTGAGGCTACGCTCAA 
GATGGCCCAGCTGGCCGACAGCCCGCGGGCCCCCCTGCAGCCCCTdGCCAAGTCCAAGGGCGG 
| GCCCGGGGGGGTGGTGACACAGCTCCCAGCGCGGCCCCCACCTGCCCTGTACACCTGCAGTCA 
GTGCAGTGGCACAGGGCCCAGCTCACAGCCCGGAGCCTCCCTGGCCCATACCGCCAGCGCCTC 
CCCGTTGGCCTCCCAGTCCTCCTACAGCCTCCTGAGCCCACCCGACAGCGCCCGCGACCGCAC 
CGACTACGTCAACTCGGCCTTCACGGAGGACGAGGCCCTGTCCCAGCACTGTCAGCTTGAGAA 
GCCCTTGAGGCACCCTCCCCTGCCTGAAGCTGCTGTCACCCTGAAACGGCCACCCCCTTACCA 
j GTGGG ACCCC ATG CTGGGTG AGG ACG TTTGGGTT CCT C AAG AAAGG AC AGC AC AG AC T T C AG G 

| GCCCAACCCCTTAAAACTGTCCTCTCTGATGCTGAGTCAGGGCCAGCACCTGGACGTGTCCCG 
j ACTGCCCTTCATCTCCCCCAAGTCTCCTGCCAGCCCCACTGCCACTTTCCAAACAGGCTATGG 
j GATGGGAGTGCCATATCCAGGAAGCTATAACAACCCCCCTTTGCCTGGAGTGCAGGCTCCCTG 
! CTCTCCCAAAGATGCCCTGTCCCCAACGCAGTTTGCACAACAGGAGCCTGCTGTGGTCCTTCA 
1 GCCGCTGTACCCACCCAGCCTCTCCTATTGCACCCTGCCCCCCATGTACCCAGGAAGCAGCAC 
j GTGCTCTAGTTTACAGCTGCCACCTGTCGCCTTGCATCCATGGAGTTCCTACAGCGCt tGCCC 

j GCCCATGCAGAACCCCCAGGGCACTCTCCCCCCAAAGCCACACTTGGTGGTGGAGAAGCCCCT 
] TGTGTCC CC ACCACCTGCCG AC C TCC AAAG C CACTTGGG CAC AG AGGTG ATGGT AG AG ACTG C 

I AGACAACTTCCAGGAAGTCCTCTCCCTGACCGAAAGCCCAGTCCCCCAGCGGACAGAAAAATT 
) TGGAAAGAAGAACCGGAAGCGCCTGGACAGCCGAGCAGAAGAAGGCAGCGTTCAGGCCATCAC 
! TG AGGG C AAAGTG AAG AAGG AGG CT AGG ACT TTG AG TG ACTTT AAT TCCCT AAT CT C C AG C C C 

1 AC AC CTGGGGAGAGAG AAG AAG AAAGTG AAG AGT C AG AAAGACCAACTG AAG TC AAAG AAG TT 

! GAATAAGACAAACGAGTTCCAGGACAGCTCCGAGAGCGAGCCTGAGCTGTTCATCAGCGGGGA 
! TGAGCTCATGAACCAGAGCCAGGGCAGCAGAAAGGGCTGGAAAAGCAAGCGCTCCCCACGGGC 
I CGCCGGCGAGCTGGAGGAGGCCAAGTGCCGGCGGGCCAGTGAGAAGGAGGACGGGCGGCTGGG 
| CAGCCAAGGCTTCGTGTACGTGATGGCCAACAAGCAGCCGCTGTGGAACGAGGCCACCCAGGT 
j CTACCAGCTGGACTTCGGGGGGCGGGTGACCCAGGAGTCCGCCAAGAACTTCCAGATTGAGTT 
j AGAGGGGCGGCAGGTGATGCAGTTTGGACGGATTGATGGCAGTGCGTACATTCTAGACTTCCA 
* GTATCCGTTCTCAGCCGTGCAGGCCTTTGCAGTTGCCCTGGCCAACGTGACTCAGCGCCTCAA 
j ATG AAGAGACTGGTGTGGGGAGGAGAGAGATGCAGAGAGCCTTTGGAAGAGGTCTTCGGAGAT 
j GCCAGAGGAGCCCTCTAGGGGTCCGATGCCTGGGAGGACCAGAAGCCAACAGCAAAACTGGAA 
. AAGCCCGGCAGGCCCAGGAGAGGGCGCTGACCTGTGGTCGTCATTTATTTGGTTGGGTTTTAT 
i TACCTTTTATTGTCTGTTCTTCTTTTCTTCTTTCATTTCAGTGGCATTTGGAAGCAAAGAGTG 
j CTAGGCACCTGCAGTTCTTTCAGGAAACAGCTTGGCTGTGGTAATGCTCTACTGGGCCCTTCA 
j GAATGAAGACAGTCTGCCTTAGAGCCTGCTATTCTTTTAGACATAGGGAGGATGCATTATCCT 
| GTATTCTCCTCCAACATCACCACTAGCGTAAAAGCAAAAAGCTTTTACAAAACACAGCCAAAA 
j ATTCTCAAGATGCAGGTTCTTGGGGAATGGGATGGGGACAGCATTTGATTTACACTGATTATG 
j TTACTCCCCAAAAGGTGACTTAATTAATAAAGGGCATTTGGGCAGACACACTGTGTTGGACCA 
ACAAAGTAGGCTCTTTACAGGGGTGTTCTCACCAGGTAGAAATGCGATTTGCTCACTGAGGAT 
GTTGGGGAAGGGACGAAGGGTAAAGAAGAAACTGCACCGTATACACAGGTTCACATCACTCTG 
CAGACAGCAATGTGACTCAGCGTGTGACTTGTAGCAGCAGTACGAGGGCTACACTCCTCTGCT 
GAGGATGTCTACATTGAAAGCCTCCACTAGTTTCATCGTTTGTCAAGTTCCGTAGATCAGTGA 
TGGTCATTCAGCATGACTGGTTCTGGGAGAAGGTGAGAGACAAAAATGGAAAGATCCTGGCCT 
! GTGGTAGTGGTAGCAGTTTTCTCAAATAATGTGGTGACAGTCACTAGATCCTCACATGCTGAG 
| AAACAGCCTCTACTCTCTCTGCCCCTTTTACTTTTTAATCTGGAATGCATTACTGTAAACATG 
! AT CTTTCCCATG AG ATAC C ATGTTCT ATG C CTTC C C ATTCTAAAAGTGTGG AC AACG C TTG AT 

I TTGAAACCACTCCTTTTCTCCTCTTGGCTACATTAAAATTCAGTTGACTACAAATGCTTTCTA 
i TCAAATTAGAAATGTAACCAAAAAAATGTTAAGTGTTCACCAGGGTATTAAAATACAGAGGGA 
! GTATGGTCA7VATCTTTTGACAAAATTTTCATGATCTTCTCTAACAAAAAAAGTTGTTTATTAA 
CTGTACAGACTGTTTACTAAGGAGCTAAACCACTGAGAAAACGTTATTAAAATTGTAATACCT 
AGGTAGTTGGTTGATCACAGTTTGTTAATTGTATAAAAAAAAATTACCTAGAATATCTCTTCC 
CACTTCCTCGTCCTCGTGAGAACCTGTGGGCAGTATTCAAGCCCTGATGACAAAACCCAGTGT 

TGATGTTGTTTAGGATCAGGCCCCTCTCCTGGGCCTGCTGTGCAGGAGCACAGGAACTATCTG 
CTGGCGTTGGGTTCCAAATTTGCATTTTATTTGGAAACAGACAAGTAGAAGATGCTACAGAAA 
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AGTATTTTCAAATTTAAACGTTTTTTAATCCCCTGTTTTAGTTAAAAAATTGGAAAAGAAACC 
GACCCATTTTTTTCCCAGATCAAGATGACATGACATCACTCCCAATTCTCTCCAAACCCCAGA 



GAAATACTGACGAAGTTTTCTGATGTGGCAAAGGATATTTCCCATCTAATACCAGTTTCTCAT 
| TTATATTTAACGTATTGGACCTGATATTTTTAGTGGGTGCATTCTTCCAGAAAGAATTC AGCA 



tATGTTATCAGAATTAATTCTTTTATATGAGTTTATGTAGCTTGATATGGTGTTTCAGTGCTTA 



j TTGGTTGTGCAATAATGGTTATAGCCTGTTAGATAATCTAAATGCAATTCCCCTGTTTTGTCG 
TTTAGGAGATAATTATTTATCTTGCTTTTCATAGTGTTCTTAGGAATTATTTTGTTGTTACGT 



TTTGGTGAGTTATACCCATTTTATTTATTTAGAAAAATAGTATCTTTGTTAACGACTTACATG 



GTCACAGTATATTTTGCTGCAAGAAATAAAGAGGATATGATAGAAGGTTTTTTTTTTTTTTTT 
TTTTTTTTTTTGAGACGGAGTCCCACTCTTGTCGCCCAACTAGAGTGTAGTGGCACAATCTCG 
GCTCCCCACAACCTCTGACTCCAGGGTTCAGGTGATTATTTTGCCTCAGCCTC CCAAG C AG CT 



GGGATTATAGACACCCGCCAACACGCCAGGCTAATGTTTTTGTATTTTTAATAGAGATGGGGT 
TTTG CC ATGTTGG CC AGG CTGG T CTTG AACT CCTG AC CT C AGGTG AT CCG C CCG C C TC GGC CT 
C CC AAAGTGCTGGGATT ACAG ACGTG AG C C ACC A CT C CCGGC C C AT AG AAGGT T TTTTG CTGG 
AT AATTTGT AACTTT TCT AAT TGGG AAAAAATT CCTATT AAT C ACTT AAAAAT TTT TT T TTGT 



CTCTTCTTTTGAAAATTTTATTTTTATCTGAAAATAACAGTTGATCTGAAATAAAAAGGGGAG 



ACCTATTAGAATGAGAGTAGCCAAGGAAAGAGTTACTAGGTAATAAGCTTCACTTTTTGTGTT 



CTAATTGTTTTTGAGATATAAAGACCCTGAAAAAGCCCATTTTAGAACCTGTTTAATAAGAGC 



AAATATAGGGGAAAATCTTTGAAATGAAAGCTACAAATACATGTGAGAAGAAAAAAATGGATT 



TTTTTAGCAAATAATTAACTAAGCTTCTAAATGCCTAGCTCCCTCCCCCAAAGGCGCTTTCCC 



CCGATGGAGGCACAGGCTTCTGTCTCGGATGTTTGGCGCACGTGAGTTTGTATGAGTTTGTAC 



CGGAGTGACCCCGGCAGCCACTGCCCACCTCCCCTCTACCCAGGGGCCTGAAAAGAGGGGCTG 



CCCTCCTGCGCCAAGGCAGACACAAGCTGCGGGCTGTGCGGTCCTAGTAGTGTGACGTTTCAG 



TTAATAGTGGTGGTCTTATTTTCAACTATGCTTTCATTCAGTCAGTCTCTGTTGACTAAATAC 



G ACG AAAATT CAT AC TTT ATG C AGG AG AT T T CT AAAAATTTAATG TT T AT T AA T AG TT T ATG A 



AT AT C AAG AT ACCTC ATTG AAT C CCT AAATTT AAAAG C AGTC CAG T AAAAG GT T AACT G T AT A 



AAGAATCTATGACTTTTTGAGGGAAGTGTGATATATTAACAAATATAACCAATTCTAAATTTG 



TTTTAGCTCTAACCTCATCAAACCAAAGGCACAGATTTGTGTACAATATACCCATTGAATGTA 



T ATCCTG AG AAAAATTGGGGC CAAAG AAG C AGG AAAATC TC AAAG CT CTAATG GC AG CAT AAA 



TCAAAGAATTTCACAGGCTAGTGTTTTTATCCATAGCCATTGCTCCCTTGTCAAGTGTCTCAC 



AAGG AC ATGG AAG AATGTGTT ATGTTC AT CTTGT AAT C ATAGC AAAAAGT CTG C AAACC C C AG 



GGTCAAGCCTGCTCTGCCACAGGGTTGGATGGTGACCTTGGGCAAGTCCCTGGGGCTGGCTAG 



G CCTCC AC TTGTCC ATCTGTG AAATG AAAGG AT C AG CCTGG AC AG C C C TCT AAAC TCCC T T AC 



AGCTCTC AG CCT AAG AG CGCAG C ACTG AAC AGCCTCATC ATT CCAC T TTTC ATGGG AAAT AT A 



TTT CAC ACC ATTGCCTTTGTGT AG AG AAAT ATTT CTTTT CCTGTG T T AATG AG CT ATG T ACTG 



AATATAAACCAGTGCATTTAAAGTAATATCTTTTGTGCACCTCTAAATGTGTTTGGAATTGTG 



TTTGTTCTCATAGAATATACAAAAGTACTGATTCTAGGTAAGAAGGAGTCTCCACGGGTGTGC 



CCTGCTCAGCTGGATGTCCATGAGAACAGCCATGAAATAAGTCACTACTTGTCCCCAAAACCA 
CAGGAATATATACCTAGGTCACCTCAAATTCCTGAGTGTGCTCTGCCATGTTACACGGTCTTC 



AAATTGAAAAGGTTTCTTGAAAAGGAAAGTTTGGCCCAGCAACTGGAGAAGGAGTCCATGGTG 



TCGCTGTGTGCCTGTATCATTTGGCCAAGTCAATGGTTGTAAGCAAAGTTAGTGGAGACAAAA 
ATGTGTCCAAAATGTCGTTTGAGTTCCTGGGATTTCTGTAATAGCACACAACTCAGAACTCTT 
CAGCATTTGTGTGATTCCTTACCTCTGGCTGATAAAACTCTAATGGGTTGTGGCTTACTTTGT 



TTCCATTTTCTTTGGCTTTGTGCAATTTTTGTGTAACTTTACTTGTACCTATATTTTCTGTTT 
ACAGTTCTTTTTAAGGGGAGGGGTAGGGTTCTAAGATCTTGTTGTTTATTGTAGATAAAAATT 
TTTTCGTGTTGTAGAAAAGCATGGGTTATGCGTTTGACTGAAAAAGACACTGTATTATTTACC 



AAAGGGGTATTGTTTTTGCATTTGTTTATAAATGCATTATTTTGGTACTGTAAATTTGGACAT 
AATTTCTGAGTTTATTACTACTGGCATTTTCTTTTTCCCTTTTTTTTTTTTTTAACCGTAAGT 
GCACGATGCAGGTGCATAGGCCCCAGACCAAACTAGACCACCAGCATGTTCATGTCCAGACCT 



CGGCAGTGGCGTGCACTGCTTGTGCACCTCAGTTCCTCCAGTGTTGGTTTGTTTGTTTTTTAA 
TT CAG CATC CTG C TGGTTTTACTTT CCAAG C AAG ATCTGTTG CG ACT C C C AAATG CGTTTT AA 
TGAGCTCATCCTTATTTGCCTTTCTTCTTACGTATTTTGTGTATTAGATTGTGCAGGAGATAT 



TCTAGAAGGCATTAATGGTTTGCATTCAAAACGATGTGGTTTGTCCAAGTTATTTTCTGTCTT 
TATTACTGAGACGGATTAATCTCCTTATTTTTTTCTTGATGATTTGAAGTTGTAAGAGTTGTC 
CAGCTATTGCTTAATAAAATTTTGCAGATCAAAAAA 



ORF Start: ATG at 1358 



SEQ ID NO; 266 



1543 aa 



ORF Stop: TGA at 5987 



MWat 168953.6kD 



NOV74a, 

CGI 40 1 70-01 Protein 
Sequence 



MYAAVEHGPVLCSDSNILCLSWKGRVPKSEKEKPVCRRRYYEEGWLATGNGRGWGVTFTSSH 
CRRDRSTPQRINFNLRGHNSEWLVRWNEPYQKLATCDADGGIFVWIQYEGRWSVELVNDRGA 
QVSDFTWSHDGTQALISYRDGFVLVGSVSGQRHWSSEINLESQITCGIWTPDDQQVLFGTADG 
QVIVMDCHGRMLAHVLLHESDGVLGMSWNYPIFLVEDSSESDTDSDDYAPPQDGPAAYPIPVQ 
NIKPLLTVSFTSGDISLMNNYDDLSPTVIRSGLKEWAQWCTQGDLLAVAGMERQTQLGELPN 
GPLLKSAMVKFYNVRGEHIFTLDTLVQRPIISICWGHRDSRLLMASGPALYWRVEHRVSSLQ 
LLCQQAIASTLREDKDVSKLTLPPRLCSYLSTAFIPTIKPPIPDPNNMRDFVSYPSAGNERLH 
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CTMKRTEDDPEVGGPCYTLYLEYLGGLVPILKGRRISKLRPEFVIMDPRTDSKPDEI YGNSLI 
STVIDSCNCSDSSDIELSDDWAAKKSPKISRASKSPKLPRISIEARKSPKLPRAAQELSRSPR 
LPLRKPS VGS PSLTRREFPFEDITHPTYLAQVTSNIWGTKFKIVGLAAFLPTNLGAVIYKTSL 
LHLQPRQMTIYLPEVRKISMDYINLPVFNPNVFSEDEDDLPVTGASGVPENSPPCTVNIPIAP 
IHSSAQAMSPTQSIGLVQSLLANQNVQLDVLTNQTTAVGTAEHAGDSATQYPVSNRYSNPGQV 
IFGSVEMGRIIQNPPPLSLPPPPQGPMQLSTVGHGDRDHEHLQKSAKALRPTPQLAAEGDAW 
FSAPQEVQVTKINPPPPYPGTIPAAPTTAAPPPPLPPPQPPVDVCLKKGDFSLYPTSVHYQTP 
LGYERITTFDSSGNVEEVCRPRTRMLGSQNTYTLPGPGSSATLRLTATEKKVPQPCSSATLNR 
LT VPR YS I PTGD P P P Y P E I AS QL AQG RG AAQ R S DNS L I HATL RRNNR EAT L KM AQL AD S P R A P 
LQPLAKSKGGPGGWTQLPARPPPALYTCSQCSGTGPSSQPGASLAHTASASPLASQSSYSLL 
SPPDSARDRTDYVNSAFTEDEALSQHCQLEKPLRHPPLPEAAVTLKRPPPYQWDPMLGEDVWV 
PQERTAQTSGPNPLKLSSLMLSQGQHLDVSRLPFISPKSPASPTATFQTGYGMGVPYPGSYNN 
PPLPGVQAPCSPKDALSPTQFAQQEPAWLQPLYPPSLSYCTLPPMYPGSSTCSSLQLPPVAL 
HPWSSYSACPPMQNPQGTLPPKPHLWEKPLVSPPPADLOSHLGTEVMVETADNFQEVLSLTE 
SPVPQRTEKFGKKNRKRLDSRAEEGSVQAITEGKVKKEARTLSDFNSLISSPHLGREKKKVKS 
QKDQLKSKKLNKTNEFQDSSESEPELFISGDELMNQSQGSRKGWKSKRSPRAAGELEEAKCRR 
ASEKEDGRLGSQGFVYVMANKQPLWNEATQVYQLDFGGRVTQESAKNFQIELEGRQVMQFGRI 
DGSAY ILDFQ YPFSAVQAFAVALANVTQRLK 



Further analysis of the NOV74a protein yielded the following properties shown in 
Table 74B. 



Table 74B. Protein Sequence Properties NOV74a 


PSort 
analysis: 


0.8800 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


r 

SignalP 
! analysis: 


No Known Signal Sequence Predicted 
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A search of the NOV74a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 74C. 



Table 74C. Geneseq Results for NOV74a 


Geneseq 
Identifier 


Protcin/Organism/Length [Patent U, 
Date] 


NOV74a i Identities/ 
Residues/ i Similarities for 
Match | the Matched 
Residues j Region 


Expect 
Value 


UBB97398 

! 


Novel human protein SEQ ID NO: 666 
- Homo sapiens, 678 aa. 
[WO200222660-A2, 21-MAR-2002] 


1..67I 
I..67I 


668/671 (99%) 
668/671 (99%) 


0.0 


ABB6I656 


Drosophiia melanogaster polypeptide 
SEQ ID NO 1 1 760 - Drosophiia 
melanogaster, 1478 aa. 
(WO200171042-A2, 27-SEP-2001] 


37..669 
86..702 


302/650 (46%) 
393/650 (60%) 


e-150 


AAM92361 


Human digestive system antigen SEQ 


4I7..495 
I..79 


79/79(100%) 
79/79(100%) 


9e-42 
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j 

1 


[ WO200 1 553 1 4-A2, 02-AUG-200 1 ] 




i 




'ABB 10608 

J 

1 

i 
1 


Human pancreatic cancer related 
polypeptide, SEQ ID NO: 257 - Homo 
sapiens, 99 aa. [WO200I55206-A 1 . 02- 
AUG-2001] 


417..495 
I..79 


79/79(100%) 
79/79(100%) 


9e-42 


jABB58828 

i 
f 
i 
i 

i 

! 


Drosophila melanogaster polypeptide 
SEQ ID NO 3276 - Drosophila 
melanogaster, 1205 aa. 
[ WO200 1 7 1 042-A2, 27-SEP-200 1 ] 


20..370 
5..338 


102/366 (27%) 
169/366 (45%) 


4e-29 



In a BLAST search of public sequence datbases, the NOV74a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 74D. 



: Table 74D. Public BLASTP Results for NOV74a 



i 

| Protein 
j Accession 
| Number 


Protein/Organism/Length 


NOV74a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


Q9NRJ4 


Tubby superfamily protein - Homo 
sapiens (Human), 1544aa. 


1 ..1543 
1..1544 


1534/1544 (99%) 
1534/1544 (99%) 


0.0 


Q9JIL5 


Tubby superfamily protein (Tubby- 
like protein 4) - Mus musculus 
(Mouse), 1 547 aa. 


1 ..1543 
1 ..1547 


1455/1547 (94%) 
1491/1547(96%) 


0.0 


Q9VB18 

i 

1 


CG5586 protein - Drosophila 
melanogaster (Fruit fly), 1478 aa. 


37..669 
86..702 


302/650 (46%) 
393/650(60%) 


e-150 


! Q922C2 


Unknown (Protein for 1 288.. 1543 
IMAGE:3592258) - Mus musculus j 1 ..256 
(Mouse), 256 aa (fragment). 


245/256 (95%) 
251/256 (97%) 

] 


e-136 


T15282 


hypothetical protein M01D7.3 - j 14. .495 
Caenorhabditis elegans, 559 aa. j 103. .538 


151/492 (30%) 
231/492 (46%) 


le-51 
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PFam analysis predicts that the NOV74a protein contains the domains shown in the 
Table 74E. 



. 

Table 74E. Domain Analysis of NOV74a 



Pfam Domain 


NOV74a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


Tub 


1466.. 1535 


31/86 (36%) 
49/86 (57%) 


1.5e-27 
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Example 75. 

The NOV75 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 75A. 



Table 75A. NOV75 Sequence Analysis 




SEQIDNO; 267 


1 1838 bp 




NOV75a, 

CGI 40 179-01 DNA 
Sequence 


AAATGGCCCAAGAAATAGATCTGAGTGCTCTCAAGGAGTTAGAACGCGAGGCCATTCTCCAGG 
TCCTGTACCGAGACCAGGCGGTTCAAAACACAGAGGAGGAGAGGACACGGAAACTGAAAACAC 
AC CTG CAG CAT CT C CGGTGG AAAGG AG CG AAG AAC AC GG ACTGGG AG C AC AAAG AG AAGTG C T 
GTGCGCGCTGCCAGCAGGTGCTGGGGTTCCTGCTGCACCGGGGCGCCGTGTGCCGGGGCTGCA 
GCCACCGCGTGTGTGCCCAGTGCCGAGTGTTCCTGAGGGGGACCCATGCCTGGAAGTGCACGG 
TG TGCTT CG AGG AC AGG AATGTC AAAAT AAAAACTGG AG AATGGT TCT ATG AGG AACG AG CCA 
AGAAATTTCCAACTGCAGGCAAACATGAGACAGTTGGAGGGCAGCTCTTGCAATCTTATCAGA 
AGCTGAGCAAAATTTCTGTGGTTCCTCCTACTCCACCTCCTGTCAGCGAGAGCCAGTGCAGCC 
GCAGTAGGCTCCAGGAGTTTGGTCAGTTTAGAGGATTTAATAAGTCCGTGGAAAATTTGTTTC 
TGTCTCTTGCTACCCACGTGAAAGAGCTCTCCAAATCCCAGAATGATATGACTTCTGAGAAGC 
ATCTTCTCGCCACGGGCCCCAGGCAGTGTGTGGGACAGACAGAGAGACGGAGCCAGTCTGACA 
CTGCGGT C AACGT C ACC ACC AGG AAGGT C AG TGC AC C AG AT ATTCTG AAAC CT CTC AAT C AAG 
AGGATCCCAAATGCTCTACTAACCCTATTTTGAAGCAACAGAATCTCCCATCCAGTCCGGCAC 
CCAGTACCATATTCTCTGGAGGTTTTAGACACGGAAGTTTAATTAGCATTGACAGCACCTGTA 
CAGAGATGGGCAATTTTGACAATGCTAATGTCACTGGAGAAATAGAATTTGCCATTCATTATT 
GCTTCAAAACCCATTCTTTAGAAATATGCATCAAGGCCTGTAAGAACCTTGCCTATGGAGAAG 
AAAAG AAG AAAAAGTG C AATC CGT ATGTG AAG ACCTACC TG TTG C CCG AC AG ATC CTC CC AG G 
GAAAGCGCAAGACTGGAGTCCAAAGGAACACCGTGGACCCGACCTTTCAGGAGACCTTGAAGT 
ATCAGGTGGCCCCTGCCCAGCTGGTGACCCGGCAGCTGCAGGTCTCGGTGTGGCATCTGGGCA 
CGCTGGCCCGGAGAGTGTTTCTTGGAGAAGTGATCATTCCTCTGGCCACGTGGGACTTTGAAG 
ACAGCACAACACAGTCCTTCCGCTGGCATCCGCTCCGGGCCAAGGCGGAGAAATACGAAGACA 
GCGTTCCTCAGAGTAATGGAGAGCTCACAGTCCGGGCTAAGCTGGTTCTCCCTTCACGGCCCA 
GAAAACT CCAAG AGG C TC AAG AAGGT C AG CC ATC ACTT C ATGGT C AA CTT TGT T TGGT AG TGC 
TAGGAGCCAAGAATTTACCTGTGCGGCCAGATGGCACCTTGAACTCATTTGTTAAGGGGTGTC 
TCACTCTGCCAGACCAACAAAAACTGAGACTGAAGTCGCCAGTCCTGAGGAAGCAGGCTTGCC 
CCCAGTGG AAACACTC AT TTGT CTTC AGTGG CGT AACCCC AG CT CAG CTG AGG C AGT C AAG CT 
TGGAGTTAACTGTCTGGGATCAGGCCCTCTTTGGAATGAACGACCGCTTGCTTGGAGGAACCA 
GACTTGGTTCAGAGGGAGACACAGCTGTTGGCGGGGATGCATGCTCACTATCGAAGCTCCAGT 
GGCAGAAAGTCCTTTCCAGCCCCAATCTATGGACAGACATGACTCTTGTCCTGCACTGACATG 
AAGGCCTCAAG 




ORF Start: ATG at 3 


j jORF Stop: TGA at 1821 




SEQIDNO: 268 


606 aa MW at 68204.5kD 


|NOV75a, 

CGI 40 179-01 Protein 
Sequence 


MAQEIDLSALKELEREAILQVLYRDQAVQNTEEERTRKLKTHLQHLRWKGAKNTDWEHKEKCC 
ARCQQ VLG FLLHRGAVCRGCSHR VCAQCR VFLRGTHAWKCT VCFEDRNVK I KTGE W FY E E RA K 
KFPTAGKHETVGGQLLQSYQKLSKISWPPTPPPVSESQCSRSRLQEFGQFRGFNKSVENLFL 
SLATHVKELSKSQNDMTSEKHLLATGPRQCVGQTERRSQSDTAVNVTTRKVSAPDILKPLNQE 
DPKCSTNPILKQQNLPSSPAPSTIFSGGFRHGSLISIDSTCTEMGNFDNANVTGEIEFAIHYC 
FKTHSLEICIKACKNLAYGEEKKKKCNPYVKTYLLPDRSSQGKRKTGVQRNTVDPTFQETLKY 
QVAPAQLVTRQLQVSVWHLGTLARRVFLGEVIIPLATWDFEDSTTQS FRWHPLRAKAEKYEDS 
VPQSNGELTVRAKLVLPSRPRKLQEAQEGQPSLHGQLCLWLGAKNLPVRPDGTLNSFVKGCL 
TLPDQQKLRLKSPVLRKQACPQWKHS FVFSGVTPAQLRQSSLELTVWDQALFGMNDRLLGGTR 
LGSEGDTAVGGDACSLSKLQWQKVLSSPNLWTDMTLVLH 



5 

Further analysis of the NOV75a protein yielded the following properties shown in 
Table 75B. 
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Table 75B. Protein Sequence Properties NOV75a 


PSort 
analysis: 


0.6000 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV75a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 75C. 

5 



Table 75C. Gcncscq Results for NOV75a 


Geneseq 
Identifier 


Protein/Organism/Lehgth (Patent #, 
Date] 


NOV75a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAG67216 


Amino acid sequence of human Parkin- 
Associated Protein 1 (PAP1) - Homo 
sapiens, 610 aa. [WO200I60857-A2, 
23-AUG-2001] 


1 ..606 
1.610 


602/610(98%) 
604/610 (98%) 


0.0 


AAG672I4 


Amino acid sequence of human Parkin- 
Associated Protein 1 (PAP1) - Homo 
sapiens, 610 aa. [WO200I60857-A2, 
23-AUG-2001] 


1 ..606 
I..6I0 


602/610(98%) 
604/610(98%) 

■ 


0.0 


AAU 19743 

i 


Human novel extracellular matrix 
protein, Seq ID No 393 - Homo sapiens. 
363 aa. [WO200155368-A1, 02-AUG- 
2001] 


144..502 
I..363 


352/363 (96%) 
353/363 (96%) 


o.o i 


AAU87138 


Novel central nervous system protein 

#48 - Homo sapiens, 363 aa. 

[ WO200 1 553 1 8-A2, 02-AUG-200 1 ] 


144..502 
1..363 


352/363 (96%) 
353/363 (96%) 


0.0 


AAG672I2 

j 


Amino acid sequence of human Parkin- 
Associated Protein 1 (PAP1) - Homo 
sapiens, 344 aa. [WO200160857-A2, 
23-AUG-200I] 


265.. 606 
1..344 


340/344 (98%) 
341/344 (98%) 


0.0 



In a BLAST search of public sequence datbases, the NOV75a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 75D. 
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Table 75D. Public BLASTP Results for NOV75a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV75a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q99N54 


Synaptotagmin-like protein 3-a - Mus 
musculus (Mouse), 607 aa. 


1..606 
1..607 


485/607 (79%) 
543/607 (88%) 


0.0 


CAC69571 


Sequence 1 from Patent WOO 16085 7 
- Homo sapiens (Human), 344 aa 
(fragment). 


265..606 
1..344 


340/344 (98%) 
341/344 (98%) 


0.0 


Q99N49 


Synaptotagmin-like protein 3-a + 3S-I 
(Synaptotagmin-like 3) - Mus 
musculus (Mouse), 41 2 aa. 


195..606 
1 ..412 


332/412(80%) 
370/412(89%) 


0.0 


Q99N48 


Synaptotagmin-like protein 3-a delta 
3S-II - Mus musculus (Mouse), 393 
aa. 


2I6..606 
3..393 


316/391 (80%) 
351/391 (88%) 


0.0 


Q99N79 


Synaptotagmin-like protein 3-b - Mus 
musculus (Mouse), 393 aa. 


2I6..606 
3.393 


315/391 (80%) 
351/391 (89%) 


0.0 



PFam analysis predicts that the NOV75a protein contains the domains shown in the 
Table 75E. 



{ Table 75E. Domain Analysis of NOV75a 


JPfam Domain 


NOV75a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 

., , — , — , 1 


RPH3A_effector 


1..252 


62/326(19%) 
117/326 (36%) 


0.27 


C2 


321..410 


26/97 (27%) 
64/97 (66%) 


1.3e-ll 


C2 


478..567 


27/98 (28%) 
62/98 (63%) 


0.0029 
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Example 76. 

The NOV76 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 76A. 



Table 76A. NOV76 Sequence Analysis 




SEQIDNO:269 


1545 bp 




NOV76a, 


TACTGAGGCTTTCGGGACGGCGGCGGGAAGATGGCGGCCTCCAGGAATGGGTTTGAAGCCGTG 


GAGGCAGAGGGCAGCGCAGGGTGCCGGGGAAGCTCGGGAATGGAGGTGGTGCTTCCTTTGGAT 
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CGI 40392-01 DNA 
Sequence 


CCTGCCGTCCCCGCCCCGCTGTGCCCTCACGGTCCCACTCTTCTGTTTGTAAAGGTGACCCAA 
GGGAAAGAAGAAACTCGGAGGTTTTATGCCTGTTCAGCCTGTAGAGATAGAAAAGACTGTAAT 
TTTTTTCAGTGGGAAGATGAAAAGTTGTCAGGAGCTAGACTTGCTGCCCGAGAAGCTCATAAC 
CGAAGATGTCAGCCTCCCCTGTCCCGAACGCAGTGTGTGGAAAGGTACTTGAAGTTTATTGAG 
TTGCCCTTGACTCAGAGAAAGTTTTGTCAAACATGTCAGCAGTTGTTGTTACCAGATGACTGG 
GGGCAACATAGTGAGCATCAGGTTCTGGGTAATGTGTCCATTACCCAGTTAAGAAGGCCCAGT 
CAACTCCTTTATCCACTGGAAAACAAGAAGACAAATGCCCAGTATCTGTTTGCTGATCGGAGC 
TGTCAGTTCTTGGTAGACTTACTTTCTGCCCTCGGATTCAGAAGAGTACTGTGTGTTGGAACA 
CCAAGGTTGCATGAGCTGATCAAGTTGACAGCATCAGGTGACAAGAAGTCTAACATTAAAAGC 
CTTTTATTGGATATTGATTTTAGGTATTCACAGTTTTATATGGAAGATAGCTTTTGCCATTAT 
AATATGTTTAACCATCATTTCTTTGATGGAAAGACTGCCCTTGAAGTATGCAGAGCATTTTTA 
CAGGAAGATAAAGGCGAAGGAATCATTATGGTTACGGATTCTCCGTTTGGTGGCTTGGTTGAA 
CCTCTGGCTATTACATTCAAGAAGTTAATTGCTATGTGGAAAGAAGGTCAAAGCCAAGATGAC 
AGTCACAAAGAACTACCCATTTTCTGGATTTTCCCCTATTTTTTTGAATCCCGAATTTGTCAG 
TTTTTTCCAAGCTTCCAGATGCTGGATTACCAGGTAGATTATGATAATCATGCACTTTATAAA 
CACGG AAAG AC AGGTCG AAAA C AGTC TC C CGTGCG T ATTTT C ACC AACATTC CGCCC AAC AAA 
ATAATCCTTCCTACTGAAGAAGGGTACAGGTTTTGCTCTCCGTGTCAACGGTATGTTTCTCTA 
GAGAATCAACACTGTGAGCTCTGTAATTCTTGCACATCCAAGGATGGCAGGAAATGGAACCAT 
TGCTTTCTCTGTAAAAAGTGTGTAAAGCCTGCCTGGATCCACTGTAGCATCTGCAATCACTGT 
GCTGTTC C AG ATC AT TCTTGTG AGGG CC C C AAAC ATG GCTGC TTT ATTTGTGGTG AAC TGG AT 
CATAAACGCAGTACTTGTCCTAACATTGCTACATCTAAGAGAGCTAACAAGTCAGTCAAAAAA 
AAAAAAAAAAAAAAAAAAAACAAAAGCCACAGAGAAAAGAAAGGGCACCACCGGTAGGTTTGG 
AATCAGAAAATTTTGGCAAGAAGAGAGTTGAAA 




ORF Start: ATG at 31 


J 


jORF Stop: TAG at 1 504 




SEQIDNO: 270 


491 aa 


MWat 56449.4kD 


NOV76a, 

CGI 40392-01 Protein 
Sequence 


MAASRNGFEAVEAEGSAGCRGSSGMEWLPLDPAVPAPLCPHGPTLLFVKVTQGKEETRRFYA 
CSACRDRKDCNFFQWEDEKLSGARLAAREAHNRRCQPPLSRTQCVERYLKFIELPLTQRKFCQ 
TCQQL LL PDDWGQH S EHQ VLG NV S I TQL RRPSQLLYP LENK KTNAQ Y LFADR S C Q FL VDL L S A 
LGFRRVLCVGTPRLHELIKLTASGDKKSNIKSLLLDIDFRYSQFYMEDSFCHYNMFNHHFFDG 
KTALEVCRAFLQEDKGEGIIMVTDSPFGGLVEPLAITFKKLIAMWKEGQSQDDSHKELPIFWI 
FPYFFESRICQFFPS FQMLDYQVDYDNHAX.YKHGKTGRKQS PVRI FTNI PPNKI I LPTEEGYR 
FCSPCQRYVSLENQHCELCNSCTSKDGRKWNHCFLCKKCVKPAWIHCSICNHCAVPDHSCEGP 
KHGCFICGELDHKRSTCPNIATSKRANKSVKKKKKKKKNKSHREKKGHHR 



Further analysis of the NOV76a protein yielded the following properties shown in 
Table 76B. 



I 1 

j Table 76B. Protein Sequence Properties NOV76a 


PSort 
analysis: 

i 


0,9800 probability located in nucleus; 0.3725 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


i 

SignalP 

analysis: 


No Known Signal Sequence Predicted 
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A search of the NOV76a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 76C. 
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Table 76C Geneseq Results for NOV76a 



1 

; Geneseq 
j Identifier 

! 


Protein/Organism/Length [Patent #, 
Date] 


NOV76a 
Residues/ 
Match 
Residues 


! Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


IAAU28264 

i 

i 


Novel human secretory protein, Seq ID 
No 621 - Homo sapiens, 328 aa. 
[WO200166689-A2, 13-SEP-2001] 


I. .303 

II. .3 14 


264/304 (86%) 
271/304 (88%) 


e-153 


. AAU28076 

i 
i 


Novel human secretory protein, Seq ID 
No 245 - Homo sapiens, 237 aa. 
[WO200166689-A2, 13-SEP-2001] 


1..233 
1..233 


233/233 (100%) 
233/233 (100%) 


e-136 


UBB67501 

i 
i 

i 


Drosophila melanogaster polypeptide 
jCy iu zyzyj - ijrosopruia 
melanogaster, 349 aa. [WO200I71042- 
A2, 27-SEP-2001] 


356..491 

I 70..JJ I 


53/139 (38%) 

77/1 10 f^^OASi 


2e-25 


' ABB68484 

1 ■ 

• 


Drosophila melanogaster polypeptide 
SEQ ID NO 32244 - Drosophila 
■melanogaster, 968 aa. [WO200171042- 
A2, 27-SEP-2001] 


389..43 1 
194..234 


20/43 (46%) 
23/43 (52%) 


0.003 


•ABB61064 

1 


Drosophila melanogaster polypeptide 
SEQ ID NO 9984 - Drosophila 
melanogaster, 285 aa. [WO200171042- 
A2, 27-SEP-2001] 


408..438 
I05..I36 


15/32 (46%) 
20/32(61%) 


0.050 



In a BLAST search of public sequence datbases, the NOV76a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 76D. 



: Table 76D. Public BLASTP Results for NOV76a 



1 

j Protein 
] Accession 
j Number 


j NOV76a 

Protein/Organism/Length j ^ t '^ CS/ 

J Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


jQ9D2ll 

i 


4930449I23Rik protein - Mus 
musculus (Mouse), 489 aa. 


1..485 
1..483 


388/485 (80%) 
431/485 (88%) 


0.0 


: Q9H5U6 

i 
1 


CDNA: FLJ23024 fls, clone 
LNG01684 - Homo sapiens 
(Human), 279 aa. 


235..485 
1..25I 


239/251 (95%) 
245/251 (97%) 


e-151 


Q96AN7 


Hypothetical 23.6 kDa protein - 
Homo sapiens (Human), 205 aa. 


25..229 
1..205 


205/205(100%) 
205/205(100%) 


e-119 


Q9NZY3 


HSPC052 - Homo sapiens 
(Human), 127 aa. 


25..I44 
I..I19 


116/120 (96%) 
117/120(96%) 


2e-65 


AAF58228 


CG12863-PA - Drosophila 
melanogaster (Fruit fly), 425 aa. 


32..491 
4..407 


150/472 (31%) 
225/472 (46%) 


2e-62 
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PFam analysis predicts that the NOV76a protein contains the domains shown in the 
Table 76E. 



Table 76E. Domain Analysis of NOV76a 



Pfam Domain 


NOV76a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


zf-CCHC 


443..460 


6/18(33%) 
11/18(61%) 


0.045 
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Example 77. 

The NOV77 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 77A. 



Table 77 A. NOV77 Sequence Analysis 




SEQ ID NO: 271 1945 bp 


NOV77a, 

CGI 40727-01 DNA 
Sequence 


CGAAGGAGGTGGTGGCTGCGTTGGGCTCCGGGAAGCCGTTCGGGCTGGGGCTGTCGGCCGCGG 


GGCGGAGGCACTCGCGCGGGGGGTAATTCGGGGTCTGGGTTCTGGTGCCGCGCAGCTTTCCCC 


GGATTGACTTGGCCTCTACTTCTTGTTAAGGAAATTCATCTCTTGTTTTATCAGGTGTGTGTG 


GTTTCAGCGCAGCATGGCTGTGGTCATCCGTTTGCAAGGTCTCCCAATTGTGGCGGGGACCAT 


GGACATTCGCCACTTCTTCTCTGGATTGACCATTCCTGATGGGGGCGTGCATATTGTAGGGGG , 


TGAACTGGATGGCCCACTGCGTGACCTTGGTTCAGCTGTCCATTTCCTGTGACCATCTCATTG ! 

ACAAGGACATCGGCTCCAAGTCTGACCCACTCTGCGTCCTTTTACAGGATGTGGGAGGGGGCA j 

GCTGGGCTGAGCTTGGCCGGACTGAACGGGTGCGGAACTGCTCAAGCCCTGAGTTCTCCAAGA j 

CTCT A C AG CTTG AGT ACCGCTTTG AG AC AGT CC AG AAG CT ACGCTTTGG AATC T A TG A C AT AG ! 

ACAACAAGACGCCAGAGCTGAGGGATGATGACTTCCTAGGGGGTGCTGAGTGTTCCCTAGGAC | 

AGATTGTGTCCAGCCAGGTACTGACTCTCCCCTTGATGCTGAAGCCTGGAAAACCTGCTGGGC | 

GGGGGACCATCACGGTCTCAGCTCAGGAATTAAAGGACAATCGTGTAGTAACCATGGAGGTAG j 

AGGCCAGAAACCTAGATAAGAAGGACTTCCTGGGAAAATCAGATCCATTTCTGGAGTTCTTCC 

GCCAGGGTGATGGGAAATGGCACCTGGTGTACAGATCTGAGGTCATCAAGAACAACCTGAACC 

CTACATGGAAGCGTTTCTCAGTCCCCGTTCAGCATTTCTGTGGTGGGAACCCCAGCACACCCA 

TCCAGGTGCAATGCTCCGATTATGACAGTGACGGGTCACATGATCTCATCGGTACCTTCCACA 

CCAGCTTGGCCCAGCTGCAGGCAGTCCCGGCTGAGTTTGAATGCATCCACCCTGAGAAGCAGC 

AGAAAAAGAAAAGCTACAAGAACTCTGGAACTATCCGTGTCAAGATTTGTCGGGTAGAAACAG 

AGTACTCCTTTCTGGACTATGTGATGGGAGGCTGTCAGATCAACTTCACTGTGGGCGTGGACT 

TCACTGGCTCCAATGGAGACCCCTCCTCACCTGACTCCCTACACTACCTGAGTCCAACAGGGG 

T C AATG AGT ACCTG ATGG C ACTG TGG AGTGTGGG C AG CGTGGTT C AGG ACT ATG A CT C AG A C A 

AGCTGTTCCCTGCATTTGGATTTGGGGCCCAGGTTCCCCCTGACTGGCAGGTCTCGCATGAAT 

TTGCCTTGAATTTCAACCCCAGTAACCCCTACTGTGCAGGCATCCAGGGCATTGTGGATGCCT 

ACCGCCAAGCCCTGCCCCAAGTTCGCCTCTATGGCCCTACCAACTTTGCACCCATCATCAACC 

ATGTGGCCAGGTTTGCAGCCCAGGCTGCACATCAGGGGACTGCCTCGCAATACTTCATGCTGT 

TGCTG CTG ACTG ATGGTG CTGTG ACGG ATGTGG AAG C C AC ACGTGAGG CTG TGGTGCGTG C CT 

CGAACCTGCCCATGTCAGTGATCATTGTGGGTGTGGGTGGTGCTGACTTTGAGGCCATGGAGC 

AGCTGGACGCTGATGGTGGACCCCTGCATACACGTTCTGGGCAGGCTGCTGCCCGCGACATTG 

TGCAGTTTGTACCCTACCGCCGGTTCCAGAATGTGAGTAGGTGCAAATTTGATCTGGGAGTTT 

ACCCTTTCACCTTTCCCTGATGTGAGTAGGGTTGAGGCCCAAGGGATGCTCACCTTTTCCTCT 


AAGCCTGGGATATGGTTGGGTTCTTGAGCATGAAATACAGACTTTGGGCAGCTGT 




ORF Start: ATG at 324 j |ORF Stop: TGA at 1 845 




SEQ ID NO: 272 |507 aa MW at 56052.8kD 


NOV77a, 


MAHCVTLVQLSISCDHLIDKDIGSKSDPLCVLLQDVGGGSWAELGRTERVRNCSSPEFSKTLQ 
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CGI 40727-0 1 Protein 
Sequence 



LEYRFETVQKLRFGIYDIDNKTPELRDDDFLGGAECSLGQIVSSQVLTLPLMLKPGKPAGRGT 
ITVSAQELKDNRWTMEVEARNLDKKDFLGKSDPFLEFFRQGDGKWHLVYRSEVIKNNLNPTW 
KRFSVPVQHFCGGNPSTPIQVQCSDYDSDGSHDLIGTFHTSLAOLQAVPAEFECIHPEKQQKK 
KSYKNSGTIRVKICRVETEYSFLDYVMGGCQINFTVGVDFTGSNGDPSSPDSLHYLSPTGVNE 
YLMTUjWSVGSWQDYDSDKliFPAFGFGAQVPPDWQVSHEFALNFNPSNPYCAGIQGI VDAYRQ 
ALPQVRLYGPTNFAPIINHVARFAAQAAHQGTASQYFMLLLLTDGAVTDVEATREA\A/RASNL 
PMSVIIVGVGGADFEAMEQLDADGGPLHTRSGQAAARDI VQFVPYRRFQNVSRCKFDLGVY PF 
TFP 



Further analysis of the NOV77a protein yielded the following properties shown in 
Table 77B. 



Table 77B. Protein Sequence Properties NOV77a 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3359 probability located in microbody 
(peroxisome); 0.1756 probability located in lysosome (lumen); 0. 1000 probability 
located in mitochondrial matrix space 


SignalP 
analysis: 




Cleavage site between residues 14 and 15 



5 



A search of the NOV77a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 77C. 



Table 77C. Geneseq Results for NOV77a j 


Geneseq 
Identifier 


Protein/Organism/Length | Patent #, 
Date) 


NOV77a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAUI9736 


Human novel extracellular matrix 
protein, Seq ID No 386 - Homo 
sapiens, 540 aa. [WO200I55368-A1, 
02-AUG-2001] 


2..494 
6..503 


331/498 (66%) 
404/498 (80%) 


0.0 


AAM39997 


Human polypeptide SEQ ID NO 3 142 - 
Homo sapiens, 548 aa. 
[WO200I53312-A1, 26-JUL-2001] 


3. .494 
21..516 


294/497 (59%) 
378/497 (75%) 


0.0 


AAB2423I 


Human vesicle associated protein 10 
SEQ ID NO:l0 - Homo sapiens, 532 
aa. [WO200060082-A2, I2-OCT-2000] 


3..494 
5..500 


294/497 (59%) 
378/497 (75%) 


0.0 


AAG62450 


Human membrane-binding protein 37 - 

Homo sapiens, 336 aa. 

[ WO200 1 38363-A 1,31 -MA Y-200 1 ] 


1.-310 
30..333 


302/310(97%) 
303/310(97%) 


e-178 
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AAU 19854. 


Human novel extracellular matrix 


57..440 


259/389 (66%) 


e-160 




protein, Seq ID No 504 - Homo 


6..394 


318/389 (81%) 






sapiens, 399 aa. [WO200155368-AI, 










02-AUG-2001] 









In a BLAST search of public sequence datbases, the NOV77a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 77D. 



Table 77D, Public BLASTP Results for NOV77a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV77a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


Q99829 


Copine I - Homo sapiens (Human), 
537 aa. ' 


I..494 
1 ..494 


492/494 (99%) 
492/494 (99%) 


0.0 


Q925K4 


Copine 1 protein - Mus musculus 
(Mouse), 454 aa (fragment). 


1..452 
1..45I 


417/452 (92%) 
436/452 (96%) 


0.0 


Q925K5 


Copine 1 protein - Mus musculus 
(Mouse), 448 aa (fragment). 


I..449 
I..448 


414/449 (92%) 
433/449 (96%) 


0.0 


075131 


Copine III - Homo sapiens 
(Human), 537 aa. 


2. .494 
3. .500 


332/498 (66%) 
405/498 (80%) 


0.0 


Q96FN4 


Unknown (protein for MGC: 1 6924) 
- Homo sapiens (Human), 446 aa. 


88..494 
3..414 


255/412 (61%) 
325/412(77%) 


e-158 



PFam analysis predicts that the NOV77a protein contains the domains shown in the 
Table 77E. 



Table 77E. Domain Analysis of NOV77a 


Pfam Domain 


NOV77.1 Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


C2 

t 


7..98 


22/100 (22%) 
60/100 (60%) 


0.036 


t ta 

C2 


140..228 


30/101 (30%) 
63/101 (62%) 


5.4e-06 



Example 78. 

The NOV78 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 78A. 
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Table 78 A. NOV78 Sequence Analysis 




SEQIDNO:273 435 bp j 


: NOV78a, 

iCG 14 1070-01 DNA 
|Sequence 

t 
1 


GGGCAAGTGCACAGTGGTCCTGGCGGCGCCATGTCATTCTGCAGCTTCTTCGGGGGCAAGGTT 


TTCCAGAATCACTTTGAGCCAGGCGTCTATATGTGTGCCAAGTGTGGCTATGAGCTGTTTCCC 
AGCCGCTCAAAGTACACATACTCATTCCCCTGGCCGGTGTTCACCAAGACCATCCGTTCTGAC 
AGCGTGGCCAAGCGCCCAGAGCACAATCATCCTGAAAGTCTTGAAGGTGTCTCGTGCAAGTGT 
GGCAACACGTTGAGCCACAAGTTCCTGAACGATGGCCCCAAGCTGCGGCAGTCCCGATTCATA 
TTCAGCAGCTCGCTGAAGTTTGTCCCTAAAGGCAAAGAAACTTCTGCTTCCCAGGCGCACTAG 
GCGGGCAGCCCACACCGACCCCAGATGGCCACGGCACTAAGGCCACACACTGGCCAT 


r L ~ 

i 


ORF Start: ATG at 31 ORF Stop: TAG at 376 


i 


SEQIDNO:274 115aa jMW at 13039.8kD 


.NOV78a, 

CG 14 1070-01 Protein 
Sequence 


MSFCSFFGGKVFQNHFEPGVYMCAKCGYELFPSRSKYTYSFPWPVFTKTIRSDSVAKRPEHNH 
PESLEGVSCKCGNTLSHKFLNDGPKLRQSRFIFSSSLKFVPKGKETSASQAH 


Further analysis of the NOV78a protein yielded the following properties shown in 
Table 78B. 



, , — " v) 

Table 78B. Protein Sequence Properties NOV78a 


PSort 
analysis: 


0.6400 probability located in microbody (peroxisome); 0.4500 probability located in 
cytoplasm; 0.1000 probability located in mitochondrial matrix space; 0. 1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV78a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 78C. 

10 



[Table 78C. Geneseq Results for NOV78a 



j 1 

i 

I Geneseq 
; Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV78a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


:AAB37402 

1 


Human secreted protein BLAST search 
protein SEQ ID NO: 1 12 - Homo 
sapiens, 99 aa. [ WO200058335-A 1 , 05- 
OCT-2000] 


7.. 104 
1..99 


78/100 (78%) 
85/100 (85%) 


6e-39 


AAY60509 


Human normal bladder tissue EST 
encoded protein 181 - Homo sapiens, 
138 aa. [DE19818620-A1, 28-OCT- 
1999] 


1.94 
45..138 


74/95 (77%) 
82/95 (85%) 


8e-39 


AAW46757 | 






4e-32 
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[W09748797-A1, 24-DEC-1997] 


1..83 


71/84 (84%) 




;ABG34053 

I 
i 


Human Pro peptide #24 - Homo sapiens. 
192 aa. [WO200224888-A2, 28-MAR- 
2002] 


5..1 1 1 
69.. 175 


36/109 (33%) 
56/109 (51%) 


8e-09 


UAM41523 

1 


Human polypeptide SEQ ID NO 6454 - 
Homo sapiens, 201 aa. [WO200I53312- 
A1.26-JUL-200I] 


5..11I 
78.. 184 


36/109(33%) 
56/109 (51%) 


8e-09 



In a BLAST search of public sequence datbases, the NOV78a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 78D. 



r 



; Table 78D. Public BLASTP Results for NOV78a 



1 

; Protein 

Accession 

Number 


Protein/Organism/Length 


NOV78a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q96RX6 


Annexin A2 like-? : selenoprotein X 
- Homo sapiens (Human), 1 16 aa. 


I..II5 
1..1 16 


94/117(80%) 
102/117 (86%) 


le-48 


Q9NZV6 


Selenoprotein X 1 (Protein 
HSPC270) - Homo sapiens 
(Human), 1 16 aa. 


I..1 15 
16 


94/117(80%) 
102/117(86%) 


le-48 


Q9JLC3 


Selenoprotein X 1 (Selenoprotein R) 
- Mus musculus (Mouse), 1 16 aa. 


1—1 15 
I..II6 


88/117(75%) 
96/117(81%) 


le-44 


Q9BTV2 

■ 

i 


Similar to selenoprotein X, 1 - 
Homo sapiens (Human), 94 aa. 


I..94 
I..94 


74/95 (77%) 
82/95 (85%) 


2e-38 


AAM3I330 

... _ . _. 


Transcriptional regulator - 
Methanosarcina mazei 
(Methanosarcina frisia), 140 aa. 


7.. 108 
34.. 138 


40/107 (37%) 
57/107 (52%) 


3e-09 



5 



PFam analysis predicts that the NOV78a protein contains the domains shown in the 
Table 78E. 



Table 78E. Domain Analysis of NOV78a 



f- " 

Pfam Domain 


NOV78a Match Region 


r 

Identities/ 
Similarities 

for the Matched Region 


Expect Value 


DUF25 . 


12..63 


18/52 (35%) 
34/52 (65%) 


0.00072 
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Example 79. 

The NOV79 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 79A. 



;TabIe 79 A. NOV79 Sequence Analysis 


j SEQIDNO:275 j 1044 bp | 


:NOV79a, 

CGI 41395-01 DNA 
Sequence 

i 
! 

j 
i 

i 

i 


CACGTCCCCTGTGCGGCCAGCGTCAGAGCCATGGCGATGGAGGAGAGGAAGCCCGAGACCGAG 


GCAACG AG AG C AC AG C CG ACC CCTTCGTC ATC C AC C A CT C AG AG C AAGC C T AC GC CCG TG AAG 
CCAAACTATGCTCTCAAGTTCACCCTTGCTGGCCACACCAAAGCAGTGTCCTCCGTGAAATTC 
AGCCCGAATGGAGAGTGGCTGGCAAGTTCATCTGCTGATAAACTCATTAAAATTTGGGGGTCA 
TATGATGGGAAATTTGAGAAAACCATGTCTGGTCACAGCCTGTGGTCGTCAGATTCTAACCTT 
TTTGTTTCCGCCTCAGATGACAAAACCTTGAAGATACGGGACGTGAGCTCGGGAAAGTGTCTG 
AAAACCCTGAAGGGACACAGTAATTATGTCTTTTGCTGTAACTTCAATCCCCAGTCCAGCCTT 
ACTGTCTCAGGATCCTTTGATGAAAGTGTGAGGATATGGGTTGTGAAAACAGGGAAGTGCCAC 
AAGACTCTGCTAGCTCACTCCGATCCAGTCTCGGCCATTCATTTTAATCGTGATGGATTCTTG 
ATAGTTTCAAGTAGCTATGATGGTCTCTGTCACATCTGGGACACCGCCTCAGGCCAGTGCCTG 
AAAACGCTCACTGATGATGACAACCCCCTGGTGTCTTTCGTGAAGCTCTCCCCGAAGGGTGGA 
T ACAT CGTGG CTG C C ACG CTGGGC AAC AC ACTC AAG CT CTGGG ACTAC AG C AAGGGG AAGTG C 
CTGAAGACATACACTGGCCACAAGAACGAGAAATACTGCATATTTGCTAATTTCTCTGTTACT 
GG CGGG AAGTGG ATTGTG TCTGG CTCGG AGG AT AACCTTCTTT AC ATCTGG AAAC T T C AG ACG 
AAAGAGATTGTACAGAAATTAGAAGGCCACACAGATGTTGTGACCTCAACAGCTTGTCACCCA 
ACAGAAAACATCATCACCTCTGCCGCGCTAGAAAATGACAAAACAATTAAACTGTGGAAGAGT 
GACTGTTAAGTCCCTTTGCTCCCACATGCGATAGAC 


j 

< — 


ORF Start: ATG at 3 1 


jORFStop: TAA at 1015 


i 


SEQ ID NO: 276 328 aa jMW at 36088.6RD 


[NOV79a, 

;CG 14 1395-0 1 Protein 
(Sequence 


MAMEERKPETEATRAQPTPSSSTTQSKPTPVKPNYALKFTLAGHTKAVSSVKFSPNGEWLASS 
SADKLIKIWGSYDGKFEKTMSGHSLWSSDSNLFVSASDDKTLKIRDVSSGKCLKTLKGHSNYV 
FCCNFNPQSSLTVSGSFDESVRIWWKTGKCHKTLLAHSDPVSAIHFNRDGFLIVSSSYDGLC 
HIWDTASGQCLKTLTDDDNPLVSFVKLSPKGGYIVAATLGNTLKLWDYSKGKCLKTYTGHKNE 
KYCIFANFSVTGGKWIVSGSEDNLLYIWKLQTKEIVQKLEGHTDVVTSTACHPTENIITSAAL 
ENDKTIKLWKSDC 



5 : 

Further analysis of the NOV79a protein yielded the following properties shown in 
Table 79B. 



| Table 79B. Protein Sequence Properties NOV79a 


jPSort 
, analysis: 


0.4500 probability located in cytoplasm; 0.4206 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0. 1000 
probability located in lysosome (lumen) 


; SignalP 
! analysis: 


No Known Signal Sequence Predicted 



1 0 A search of the NOV79a protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 79C. 
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! Table 79C. Gcneseq Results for NOV79a 



Geneseq 
Identifier 


Protein/Organism/Lcngth [Patent #, 
Date] 


NOV79a 
Residues/ 
Match 
Residues 


Identities/ 

Similarities for j Expect 
the Matched j Value 
Region 


ABB97345 


Novel human protein SEQ ID NO: 613 - 
Homo sapiens, 334 aa. [WO200222660- 
A2,21-MAR-2002] 


1..328 
1..334 


299/334(89%) ie-176 
307/334(91%) ! 


AAB68529 


Human GTP-bmding associated protein 
#29 - Homo sapiens, 334 aa. 
[WO2001 05970- A2, 25-JAN-2001] 


1..328 
1..334 


299/334(89%) ie-176 
307/334(91%) j 

! 


AAB63186 


Human secreted protein sequence 
encoded by gene 3 SEQ ID NO: 11 2 - 
Homo sapiens, 317 aa. [WO200061629- 
Al, 19-OCT-2000] 


17. .327 
1.317 


285/317(89%) ie-168 
292/317(91%) j 

i • 


ABB68576 


Drosophila melanogaster polypeptide 
SEQ ID NO 32520 - Drosophila 
melanogaster, 361 aa. [WO200I7I042- 
A2,27-SEP-200I] 


15..327 
42..360 


260/319(81%) 
279/319(86%) 


je-154 

i 

i 
I 

je-149 


AAB93659 


Human protein sequence SEQ ID 
NO: 1 3 175 - Homo sapiens, 330 aa. 
[EP1 07461 7-A2 ? 07-FEB-2001] 


9..327 
5..329 


253/325 (77%) 
275/325 (83%) 


In a BLAST search of public sequence datbases, the NOV79a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 79D. 


Table 79D. Public BLASTP Results for NOV79a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV79a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities Tor 
the Matched 
Portion 


i 

Expect 
Value 


Q9UGP9 


WD-repeat protein 5 (WD repeat protein 
B1G-3) - Homo sapiens (Human), and, 
334 aa. 


I..328 
1..334 


299/334 (89%) 
307/334 (91%) 


e-176 


Q8T776 


Hypothetical 38.6 kDa protein - 
Branchiostoma floridae (Florida 
lancelet) (Amphioxus), 353 aa. 


1..327 
1..352 


277/352 (78%) 
298/352 (83%) 


e-160 


Q9V3J8 


Will die slowly protein - Drosophila 
melanogaster (Fruit fly), 361 aa. 


1 5.327 
42.360 


260/319(81%) 
279/319(86%) 


e-153 


Q9NUL4 


CDNA FLJ1 1287 fis, clone 
PLACE1009596, weakly similar to 
vegetatible incompatibility protein HET- 
E-l - Homo sapiens (Human), 330 aa. 


9.327 
5.329 


253/325 (77%) 
275/325 (83%) 


e-148 
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Q9D7H2 


i " - 

2310009C03Rik protein - Mus musculus 


I2..328 


235/323 (72%) 


e-139 




J (Mouse), 328 aa. 


8.-328 


271/323 (83%) 





PFam analysis predicts that the NOV79a protein contains the domains shown in the 
Table 79E. 



i 1 — ' — 1 """ — 1 — 1 

j Table 79E. Domain Analysis of NOV79a 



1 Pram Domain 

i 
I 


NOV79a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


iwD40 

! 
1 


37..73 


17/37(46%) 
30/37(81%) 


4.3e-06 


j WD40 


115..151 


17/37(46%) 
31/37 (84%) 


8.2e-06 


1 WD40 


157.. 193 


15/37(41%) 
29/37(78%) 


9.4e-06 


! WD40 

l 

1 


242..281 


10/40(25%) 
32/40(80%) 


0.04 


j 

| WD40 

i 


287..325 


12/39(31%) 
29/39 (74%) 


0.18 



Example 80. 

The NOV80 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 80A. 



[Table 80A. NOV80 Sequence Analysis 


! 


SEQ IDNO: 277 1602 bp j 


|NOV80a 5 

CG19I018-01 DNA 
Sequence 

! 
! 

i 


GGAGCCATGCGGCGATCGAGGAGCTCTGCGGCCGCCAAGCTGCGCGGGCAGAAGCGGTCCGGG 
GCCTCCGCGGCCCCCGCGGCCTCCGCGGCCGCTGCCTTGGCACCCAGCGCCACCCGCACACGG 
CGCTCCGCTAGCCAGGCCGGGAGCAAGAGCCAGGCGGTGGAGAAGCCGCCGTCGGAGAAGCCG 
CGGCTGAGGCGCTCGTCGCCGCGGGCCCAGGAGGAGGGCCCGGGGGAGCCGCCGCCGCCTGAG 
CTGGCGTTGCTCCCGCCACCGCCGCCGCCGCCGCCGACTCCCGCGACCCCGACGTCCTCGGCG 
TCCAACCTGGACCTGGGCGAGCAGCGGGAGCGCTGGGAGACGTTCCAGAAGCGGCAGAAGCTT 
ACCTCCGAGGGTGCCGCCAAGCTCCTGCTAGACACCTTTGAATACCAGGGCCTGGTGAAGCAC 
ACAGGAGGCTGCCACTGTGGAGCAGTTCGTTTTGAAGTTTGGGCCTCAGCAGACTTGCATATA 
TTTGACTGCAATTGCAGCATTTGCAAGAAGAAGCAGAATAGACACTTCATTGTTCCAGCTTCT 
CGCTTCAAGCTCCTGAAGGGAGCTGAGCACATAACGACTTACACGTTCAATACTCACAAAGCC 
CAGCATACCTTCTGTAAGAGATGTGGCGTTCAGAGCTTCTATACTCCACGATCAAACCCCGGA 
GGCTT CGG AATTGCCC C CC ACTG CCTGG ATG AGGG C ACTGTG CGG AG T ATGGT C A CTG AGG AA 
TTCAATGGCAGCGATTGGGAGAAGGCCATGAAAGAGCACAAGACCATCAAGAACATGTCTAAA 
GAGTGAGCTTCTGCCTCTCCTGCCCTGAAAAGGAGGAATGATTGGGGCCAGCAACTTTGCTCT 


CCCTGCCGTGCCTCGGTGGTGCTCCTGAATGTGGCTGACCTGGGCTGCTGGTTCCGTTGACTA 


GGGTCATCTTGATCTCTGCAGTTTGCTCCAGCTACCAGTTTCTTTAGGCAGCTCTTTGTCCTC 


CCTCTGCCCAGATTTTGATGTAGTCTAATTGACATCCTTCTCTTCCCAACTTTTGTGTGATCC 


AGCAGAGCATGTGAGACTCTTTGATATGCACCTTCATGTATTATCTTGTTCAGTTCTCTGAGG 


TTGGGATCATTATTATTTCCCATTTTGCAGATGAGAGAATTGAGGCAGAGAAAGGTTCAGCAC 


CTTGCCTTTGGTTACACAGCTGGTCATTCTGGCTTCAATCGCAGGACTACCAGCCTGTGCTCT 
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i 

i 
J 

i 
t 


TCACCACTTAGCTTCCCTGACTCAGGCCACTTCCCTGGAGCGTTAGCTGGATTCTGAGAGTAG 


TTTCCAAGCCAGAGCTTTCAGAGAGCTTTTGTTCGTAGGACAATTTTAAGACATCAGGTTCTT 


GAATGTTTTGTGTTTTTTTAAGTCTCAGATTTATCTTCCTACTTCCTACTTCTCCAAAAAGAC 


TGAGAGCTGACATATTTGATTGTAAGCTCTTTGAGGCAGAGTTCTTGTAATCGTCTCTGTATA 


AAACAGTGCCCACCCCAGTGACCTGTACTTGGATGCTTCAATCAGAGCTGTCCTGTTAAATAG 


AGCAAGTTTTTCCTAGACCCACATTCT 


! 
1 


ORF Start: ATG at 7 


ORF Stop: TGA at 823 


r 


SEQ ID NO; 278 272 aa |MW at 29730.3kD 


|NOV80a, 

jCG 1910 18-01 Protein 
{Sequence 

i 


MRRSRSSAAAKLRGQKRSGASAAPAASAAAALAPSATRTRRSASQAGSKSQAVEKPPSEKPRL 
RRSSPRAQEEGPGEPPPPELALLPPPPPPPPTPATPTSSASNLDLGEQRERWETFQKRQKLTS 
EG AAKLLLDT FE YQG L VKHTG GCHCG A V R F E VW A S AD LHIFDCNCSICK K KQNRH F I V P AS R F 
KLLKGAEHITTYTFNTHKAQHTFCKRCGVQSFYTPRSNPGGFGIAPHCLDEGTVRSMVTEEFN 
GSDWEKAMKEHKTIKNMSKE 



Further analysis of the NOV80a protein yielded the following properties shown in 
Table 80B. 



r - 

Table 80B. Protein Sequence Properties NOV80a 


PSort 
i analysis: 

1 


0.7941 probability located in mitochondrial matrix space; 0.6305 probability located 
in mitochondrial intermembrane space; 0.4722 probability located in mitochondrial 
inner membrane; 0.4722 probability located in mitochondrial outer membrane 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



5 



A search of the NOV80a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 80C. 



Table 80C. Geneseq Results for NOV80a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date) 


NOV80a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB43357 


Human ORFXORF3 121 polypeptide 
sequence SEQ 1DN0:6242 - Homo 
sapiens, 245 aa. [WO200058473-A2, 
05-OCT-2000] 


32..272 
5..245 


234/24 1 (97%) 
235/241 (97%) 


e-140 


AAM38640 


Human colorectal cancer antigen SEQ 
ID NO: 1 55 - Homo sapiens, 1 94 aa. 
[ WO200 1 55350-A 1 , 02-AUG-200 1 ] 


19.212 
1 ..194 


187/194 (96%) 
187/194 (96%) 


e-ll 1 


AAM93169 


Human digestive system antigen SEQ 
ID NO: 2518 • Homo sapiens, 194 aa. 
[ WO200 1 553 1 4-A2, 02-A UG-200 1 ] 


19.212 
1..I94 


187/194 (96%) 
187/194 (96%) 


e-lll 


ABB77648 


Ribosomal protein si 3.09 amino acid 


169.. 272 
15..118 


103/104(99%) 
104/104(99%) 


5e-59 
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[CNI326940-A, 19-DEC-2001] 




! 


AAM78645 


Human protein SEQ ID NO 1307 - 

Homo sapiens, 1 1 8 aa. 

[ WO200 1571 90-A2, 09- AUG-200 1 ] 


I69..272 
15..1 18 


103/104(99%) j 5e-59 
104/104(99%) | 



In a BLAST search of public sequence datbases, the NOV80a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 80D. 



Table 80D. Public BLASTP Results for NOV80a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV80a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


AAM76703 


Nuclear protein p30 - Homo sapiens 
(Human), 271 aa (fragment). 


2,212 
1..271 


271/271 (100%) 
271/271 (100%) 


e-160 


Q9CXS4 


3110013H01Rik protein - Mus 
musculus (Mouse), 252 aa. 


1..272 
1..252 


200/272 (73%) 
213/272 (77%) 


e-108 


AAM29678 


Hypothetical 14.6 kDa protein - 
Caenorhabditis elegans, 129 aa. 


143..261 
4.. 122 


63/119(52%) 
77/119(63%) 


2e-32 


Q9LFK7 


Hypothetical 15.0 kDa protein - 
Arabidopsis thaliana (Mouse-ear 
cress), 135 aa. 


142. .272 
5.. 135 


61/132(46%) 
86/132(64%) 


2e-30 


Q9AMY1 


ID747 - Bradyrhizobium 
japonicum, 407 aa. 


143..259 
267.383 


53/117(45%) 
75/117(63%) 


8e-28 



5 

PFam analysis predicts that theNOV80a protein contains the domains shown in the 
Table 80E. 



Table 80E. Domain Analysis of NOV80a 






Identities/ 




Pfam Domain 


NOV80a Match Region 


Similarities 


Expect Value 






for the Matched Region 







10 Example 81. 

The NOV81 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 8 1 A. 
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jTable 81 A. NOV81 Sequence Analysis 




SEQ ID NO: 279 1 1051 bp | 


NOV81a, 

CG56 125-01 DNA 

Sequence 


CCGCTGCTACCCGGCATGTCGGCGGAGGCCTCGGGCCCGGCTGCCGCrGCC;r,rrrrr;TrrrTr: 
GAAGCCCCCAAGCCCTCGGGTCTCGAGCCTGGCCCCGCCGCCTACGGTCTCAAGCCGCTGACC 
CCGAACAGCAAATACGTGAAGCTGAACGTGGGCGGCTCGTTGCACTACACCACGCTGCGCACC 
CTCACGGGACAGGACACCATGCTCAAAGCCATGTTCAGCGGCCGCGTGGAGGTGCTGACCGAT 
GCCGGAGGTTGGGTGCTGATTGACCGGAGCGGCCGTCACTTTGGTACAATCCTCAATTACCTG 
CGGGATGGGTCTGTGCCACTGCCGGAGAGTACGAGAGAACTGGGGGAGCTGCTGGGCGAAGCA 
CGCTACTACCTGGTGCAGGGCCTGATTGAGGACTGCCAGCTGGCGCTGCAGCAAAAAAGGGAG 
ACGCTGTCCCCGCTGTGCCTCATCCCCATGGTGACATCTCCCCGGGAGGAGCAGCAGCTCCTG 
GCCAGCACCTCCAAGCCCGTGGTGAAGCTCCTGCACAACCGCAGTAACAACAAGTACTCCTAC 
ACCAGCACTTCAGATGACAACCTACTTAAGAACATCGAGCTGTTCGACAAGCTGGCCCTGCGC 
TTCCACGGGCGGCTACTCTTCCTCAAGGATGTCCTGGGGGACGAGATCTGCTGCTGGTCTTTC 
TACGGGCAGGGCCGCAAAATCGCCGAGGTGTGCTGCACCTCCATTGTCTATGCTACGGAGAAG 
AAGCAGACCAAGGTGGAATTTCCAGAGGCCCGGATCTTCGAGGAGACCCTGAACATCCTCATC 
TACGAGACTCCCCGGGGCCCAGACCCAGCCCTCCTGGAGGCCACAGGGGGAGCAGCTGGAGCT 
GGTGGGGCTGGCCGCGGGGAGGATGAAGAGAACCGAGAGCACCGTGTCCGCAGGATCCATGTC 
CGGCGCCATATCACCCACGACGAGCGTCCTCATGGCCAACAAATTGTCTTCAAGGACTGACCT 
CTGACCCTCCCCCTGCCTTCCTCTTGCCTTGGGACCCAGTCCC 




ORF Start: ATG at 16 


ORF Stop: TGA at 1003 




SEQ ID NO: 280 329 aa M W at 36357.0kD 


N0V81a, 

CG56 125-01 Protein 
Sequence 


MSAEASGPAAAAAPSLEAPKPSGLEPGPAAYGLKPLTPNSKYVKLNVGGSLHYTTLRTLTGQD 
TMLKAMFSGRVEVLTDAGGWVLI DRSGRHFGTI LNYLRDGSVPLPESTRELGELLGEAR YYLV 
QGLIEDCQLALQQKRETLSPLCLIPMVTSPREEQQLLASTSKPWKLIiHNRSNNKYSYTSTSD 
DNLLKNIELFDKLALRFHGRLLFLKDVLGDEICCWSFYGQGRKIAEVCCTSIVYATEKKQTKV 
EFPEARIFEETLNILIYETPRGPDPALLEATGGAAGAGGAGRGEDEENREHRVRRIHVRRHIT 
HDERPHGQQIVFKD 




SEQ ID NO: 281 ]852 bp j 


CG56 125-02 DNA 
Sequence 


TTTTTTTTAGTTTCTCAACTTAACTTTATTTCAATAATTTAATAGAAAATTAAAATAATAAAT 


AATATGAAACAGACTGATAACGCTGAGCTGGGCAGGCCCAGGCCAGTCTAGTACAAAGTTAAG 


GAGGTAGGGAGGATGGTGGGGAGGAGGGGGCGGACTACCCTGCAGGACGCGGGAGGCTGCTCA 


G ACTGTGGTG ATGTC AGG AAGGG CCG C AC AC T TTGGCAT GG ACG ATG CAC T AAAAAAAG AG AA 


AGGGAATTCTAAATCCCTCTTAACCAGCTGGAGAGGGAAGGACGCAGGGCCAGGGTGGGGACA 


AGTGTTGGCTTCGGAAGGCTCTGAGTGGTGGGGCCGGAATGTACCATGTTGTTAGCAATGGGG 


TTGGGACGGGTGGAGAAGGGCCAAAGTGAGCTGTGCCATGCAATGAAGGGACAGAGGAGGACC 
C ACG AC TTGG CC AG C AG AG CC GG GG C AAAAG T CTGGG AAGGG GAGGGAAAG AG AG AGG G ACTG 
GGTCCCAAGGCAAGAGGAAGGCAGGGGGAGGGTCAGAGGTCAGTCCTTGAAGACAATTTGTTG 
GCCATGAGGACGCTCGTCGTGGGTGATATGGCGCCGGACATGGATCCTGCGGACACGGTGCTC 
TCGGTTCTCTTCATCCTCCCCGCGGCCAGCCCCACCAGCTCCAGCTGCTCCCCCTGTGGCCTC 
CAGGAGGGCTGGGTCTGGGCCCCGGGGAGTCTCGTAGATGAGGATGTTCAGGGTCTCCTCGAA 
GATCCGGGCCTCTGGAAATTCCACCTGCAAAAGGCCAGCCGGCCCCAGCTCCTTCCTTCTGGT 
GCAGTCACGCAGGGCCATCCCCCTGCCTAGGNN 




ORF Start: ATG at 361 ] 


ORF Stop: TAG at 847 




SEQ1DN0:282 162aa MW at 17258.6kD 


NOV81b, 

|CG56 125-02 Protein 
Sequence 


MLLAMGLGRVEKGQSELCHAMKGQRRTHDLASRAGAKVWEGEGKREGLGPKARGRQGEGQRSV 
L EDNLLAMRTL WGDMAP DM D PA DT VL S VL F I L P AAS PTS S S CS P CG LQEG WV W A PG S L VDE D 
VQGLLEDPGLWKFHLQKASRPQLLPSGAVTQGHPPA 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 8 IB. 



Table 81B. Comparison of NOV81a against NOV81b. 


Protein Sequence 


NOV81a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 
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NOV8Ib 


126.132 


6/7 (85%) 




127.. 133 


111 (99%) 



Further analysis of the NOV8 la protein yielded the following properties shown in 
Table 8 1C. 



( Tabic 81C. Protein Sequence Properties NOV81a 


i PSort 
analysis: 


0.6500 probability located in cytoplasm; 0.1000 probability located in mitochondrial 
matrix space; 0.1000 probability located in lysosome (lumen); 0.0000 probability 
located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



5 



A search of the NOV81a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 8 1 D. 



Table 81 D. Geneseq Results for NOV81a 


Geneseq 
Identifier 


Protein/Organism/Length (Patent #, 
Date) 


NOV81a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM39908 


Human polypeptide SEQ ID NO 3053 

- Homo sapiens, 329 aa. 

[WO200 1 533 1 2-A 1 , 26-JUL-200 1 ] 


1..329 
1..329 


329/329(100%) 
329/329(100%) 


0.0 


ABB06073 


Human NS protein sequence SEQ ID 
NO: 165 - Homo sapiens, 323 aa. 
[ WO2002063 1 5-A2, 24-JAN-2002] 


34..324 
34.323 


213/293 (72%) 
253/293 (85%) 


e-123 


AAB94285 


Human protein sequence SEQ ID 
NO: 14723 - Homo sapiens, 310 aa. 
[ EP 1 0746 1 7- A2, 07-FEB-200 1 ] 


34.324 
22.310 


213/293 (72%) 
252/293 (85%) 


c-121 


AAM94003 


Human stomach cancer expressed 
polypeptide SEQ ID NO 76 - Homo 
sapiens, 3 10 aa. [WO200I09317-A1, 
08-FEB-2001] 


34.324 
22.310 


213/293 (72%) 
252/293 (85%) 


e-121 


AAB42647 


Human ORFX ORF241 1 polypeptide 
sequence SEQ ID NO:4822 - Homo 
sapiens, 195 aa. [WO200058473-A2, 
05-OCT-2000] 


72..252 
L 181 


181/181 (100%) 
181/181 (100%) 


e-102 



10 
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In a BLAST search of public sequence datbases, the NOV8la protein was found to 
have homology to the proteins shown in the BLASTP data in Table 8 1 E. 



| Tabic 81E. Public BLASTP Results for NOV81a 


i 

Protein 

Accession 

Number 


Protein/Organism/Length 


NOV81a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q8WZ19 


Hypothetical 36.4 kDa protein - Homo 
sapiens (Human), 329 aa. 


1..329 
1..329 


329/329(100%) 
329/329(100%) 


0.0 


Q96SA1 


TNFAIPl-like protein - Homo sapiens 
(Human), 329 aa. 


1.329 
1..329 


327/329 (99%) 
328/329 (99%) 


0.0 


Q96P93 


Polymerase delta-interacting protein 1 - 
Homo sapiens (Human), 329 aa. 


I..329 
I..329 




321/329(97%) 
325/329 (98%) 


0.0 

. ! 


Q96SU0 


CDNA FLJ14637fls, clone 
NT2RP2001327, moderately similar to 
tumor necrosis factor, alpha-induced 
protein 1 - Homo sapiens (Human), 3 1 0 
aa. 


34..324 
22..310 


213/293 (72%) 
252/293 (85%) 


e-121 

i 


Q9H3F6 


MSTP028 - Homo sapiens (Human), 
313 aa. 


34..324 
25..313 


213/293 (72%) 
252/293 (85%) 


e-121 



5 PFam analysis predicts that the NOV8 1 a protein contains the domains shown in the 

Table 8 IF. 



Table 81F. Domain Analysis of NOV81a 


Pfam Domain 


NOV81a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


Kjetra 


41..138 


35/1 1 1 (32%) 
71/111 (64%) 


2.8e-27 



Example 82. 

10 The NOV82 clone was analyzed, and the nucleotide and encoded polypeptide 

sequences are shown in Table 82A. 



Table 82A. NOV82 Sequence Analysis 








SEQIDNO:283 


630 bp 




NOV82a, 


GTTCGGGAGCCGCGGCTTATGGTGCAGACATGGCCAAGTCCAAGAACCACACCACACACAACC 


CG571 13-01 DNA 


AGTCCCGAAAATGGCACAGAAATGGTATCAAGAAACCCCGATCACAAAGATACGAATCTCTTA 
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ISequence 
< 


AGGGGG TGG AC C C CAAGTTCCTG AGG AAC ATG CG CTT TGC C AAG AAGC AC AAC AAAAAGGG C C 
TAAAGAAGATGCAGGCCAACAATGCCAAGGCCATGAGTGCACGTGCCGAGGCTATCAAGGCCC 
TCGTAAAGCCCAAGGAGGTTAAGCCCAAGATCCCAAAGGGTGTCAGCCGCAAGCTCGATCGAC 
TTGCCTACATTGCCCACCCCAAGCTTGGGAAGCGTGCTCGTGCCCGTATTGCCAAGGGGCTCA 
GGCTGTGCCGGCCAAAGGCCAAGGCCAAGGCCAAGGCCAAGGATCAAACCAAGGCCCAGGCTG 
CAGCCCCAGCTTCAGTTCCAGCTCAGGCTCCCAAACGTACCCAGGCCCCTACAAAGGCTTCAG 
AGT AG AT ATCTCTGC C AAC ATG AGG A C AG AAGG ACTG GTG CG ACC C CCC AC CCC CG C CC CTGG 


i 


GCTACCATCTGCATGGGGCTGGGGTCCTCCTGTGCTACTGGTACAAATAAACCTGAGGCAGGA 




ORF Start: ATG at 30 




ORF Stop: TAG at 507 


j 


SEQ ID NO: 284 


I59aa |MWat 17751.9kD 


;NOV82a, 

|CG57 113-01 Protein 
Sequence 


MAKSKNHTTHNQSRKWHRNGIKKPRSQRYESLKGVDPKFLRNMRFAKKHNKKGLKKMQANNAK 
AMSARAEAIKALVKPKEVKPKIPKGVSRKLDRLAYIAHPKLGKRARARIAKGLRLCRPKAKAK 
AKAKDQTKAQAAAPASVPAQAPKRTQAPTKASE 




SEQ ID NO: 285 


600 bp 




NOV82b, 

CG57 113-03 DNA 

Sequence 

j 

S 
i 


CGCGGCTTATGGTGCAGACATGGCCAAGTCCAAGAACCACACCACACACAACCAGTCCCGAAA 


ATGGCACAGAAATGGTATCAAGAAACCCCGATCACAAAGATACGAATCTCTTAAGGGGGTGGA 
CCCCAAGTTCCTGAGGAACATGCGCTTTGCCAAGAAGCACAACAAAAAGGGCCTAAAGAAGAT 
GCAGGCCAACAATGCCAAGGCCATGAGTGCACGTGCCGAGGCTATCAAGGCCCTCGTAAAGCC 
CAAGGAGGTTAAGCCCAAGATCCCAAAGGGTGTCAGCCGCAAGCTCGATCGACTTGCCTACAT 
TGCCCACCCCAAGCTTGGGAAGCGTGCTCGTGCCCGTATTGCCAAGGGGCTCAGGCTGTGCCG 
GCCAAAGGCCAAGGCCAAGGCCAAGGCCAAGGCCAAGGATCAAACCAAGGCCCAGGCTGCAGC 
C C C AGCT TC AGTTCC AGCTC AGG CTC CC AAACGT AC C C AGGCCCCT AC AAAGG C TTC AG AG T A 
GATATCTCTGCCAACATGAGGACAGAAGGACTGGTGCGACCCCCCACCCCCGCCCCTGGGCTA 


i 


CC ATCTG CATGGGGC TGGGGT C C TCCTGTGCT A 


) 




ORF Start: ATG at 20 


jORF Stop: TAG at 503 


i 

i 


SEQ ID NO: 286 


161 


aa jMWat 1 7951.1 kD 


: NOV82b, 

:CG57l 13-03 Protein 
'Sequence 


MAKSKNHTTHNQSRKWHRNGIKKPRSQRYESLKGVDPKFLRNMRFAKKHNKKGLKKMQANNAK 
AMSARAEAIKALVKPKEVKPKIPKGVSRKLDRLAYIAHPKLGKRARARIAKGLRLCRPKAKAK 
AKAKAKDQTKAQAAAPASVPAQAPKRTQAPTKASE 




SEQ ID NO: 287 


579 bp 




>NOV82c, 
;CG571 13-02 DNA 
'Sequence 

! 

1 


ACTCACTATAGGGCTCGAGCGGCGCTTCGGGAGCCGCGGCTTATGGTGCAGACATGGCCAAGT 


CCAAGAACCACACCACACACAACCAGTCCCGAAAATGGCACAGAAATGGTATCAAGAAACCCC 
GATCACAAAGATACGAATCTCTTAAGGGGGTGGACCCCAAGTTCCTGAGGAACATGCGCTTTG 
CCAAG AAGCAC AAC AAAAAGGGC CT AAAG AAG ATGC AGGCC AAC AATGCC AAG G C CAT GAG TG 
CACGTGCCGAGGCTATCAAGGCCCTCGTAAAGCCCAAGGAGGTTAAGCCCAAG AT CCC AAAGG 
GTGTCAGCCGCAAGCTCGATCGACTTGCCTACATTGCCCACCCCAAGCTTGGGAAGCGTGCTC 
GTGCCCGTATTGCCAAGGGGCTCAGGCTGTGCCGGCCAAAGGCCAAGGCCAAGGCCAAGGCCA 
AGGCCAAGGATCAAACCAAGGCCCAGGCTGCAGCCCCAGCTTCAGTTCCAGCTCAGGCTCCCA 
AACGTACCCAGGCCCCTACAAAGGCTTCAGAGTAGATATCTCTGCCAACATGAGGACAGAAGG 


I 

t . . 


GCCTGGTGTCCA 










ORF Start: ATG at 54 


|ORF Stop: TAG at 537 




SEQ ID NO: 288 


161 aa jMWat 17951. lkD 


NOV82c, 

'CG57 113-02 Protein 
Sequence 


^KSKNHTTHNQSRKWHRNGIKKPRSQRYESLKGVDPKFLRNMRFAKKHNKKGLKKMQANNAK 
AMSARAEAIKALVKPKEVKPKIPKGVSRKLDRLAYIAHPKLGKRARARIAKGLRLCRPKAKAK 
AKAKAKDQTKAQAAAPASVPAQAPKRTQAPTKASE 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 82B. 
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Table 82B. Comparison of NOV82a against NOV82b and NOV82c. 


Protein Sequence 


NOV82a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV82b 


I..159 
1..I6I 


116/161 (72%) 
116/161 (72%) 


NOV82c 


I..I59 
I..I6I 


116/161 (72%) 
116/161 (72%) 



Further analysis of the NOV82a protein yielded the following properties shown in 
Table 82C. 



Tabic 82C Protein Sequence Properties NOV82a 


PSort 
analysis: 


0.9840 probability located in nucleus; 0.1000 probability located in mitochondria! 
matrix space; 0.1000 probability located in lysosome (lumen); 0.0000 probability 
located in endoplasmic reticulum (membrane) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



5 

A search of the NOV82a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 82D. 



, Table 82D. Geneseq Results for NOV82a 



Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#, Date) 


NOV82a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


i 

Expect 
Value 


AAW71709 


Heparan sulfate/heparin interacting 
protein - Homo sapiens, 159 aa. 
[W09838214-A1, 03-SEP-I998) 


I ..159 
I..I59 


159/159(100%) 
159/159(100%) 


4e-87 


AAM80294 


Human protein SEQ ID NO 3946 - 
Homo sapiens, 1 67 aa. 
[WO200157190-A2, 09-AUG-2001] 


1 ..159 
1 ..167 


159/167 (95%) 
159/167 (95%) 


6e-85 


ABG26996 


Novel human diagnostic protein 
#26987 - Homo sapiens, 223 aa. 
[WO200 1 75067-A2, 1 1 -OCT-2001 ] 


I..149 
12.156 


144/149 (96%) 
144/149 (96%) 


6e-77 


ABG11437 


Novel human diagnostic protein 
#1 1428 - Homo sapiens, 189 aa. 
[WO200I75067-A2, 11 -OCT-2001] 


5.. 159 
32..189 


146/158(92%) 
149/158 (93%) 


2e-76 
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ABG07644 


Novel human diagnostic protein 
#7635 - Homo sapiens, 571 aa. 
[WO200 1 75067- A2, U-OCT-2001] 


I..I54 | 139/154(90%) 
1..150 ! 141/154(91%) 

i 


le-72 


In a BLAST search of public sequence datbases, the NOV82a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 82E. 


] Tabic 82E. Public BLASTP Results for NOV82a 


t 

\ Protein 
Accession 
Number 


I 

| Protein/Organism/Length 


NOV82a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


AAH08926 


Ribosomal protein L29 - Homo 
sapiens (Human), 159 aa. 


I..I59 
I..159 


159/159(100%) 
159/159(100%) 


le-86 


; P47914 i 60S ribosomal protein L29 (Cell 
! ] surface heparin binding protein HIP) 
j ] - Homo sapiens (Human), 1 58 aa. 


2.. 159 
I..158 


158/158(100%) 
158/158(100%) 


4e-86 


! Q9XS36 ] Ribosomal protein 

I L29/heparin/heparan sulfate 

j interacting protein - Sus scrofa (Pig), 

! i 160 aa. 

: .1 


1.159 
1..160 


137/160(85%) 
143/160 (88%) 


4e-72 


R6RT43 

S 


ribosomal protein L29, cytosolic 
[validated] - rat, 156 aa. 


1.157 
1..155 


127/157(80%) 
137/157(86%) 


3e-67 


P25886 j 60S ribosomal protein L29 (P23) - 
; \ Rattus norvegicus (Rat), 155 aa. 


2..157 
1 .154 


126/156(80%) 
136/156 (86%) 


le-66 



5 

PFam analysis predicts that the NOV82a protein contains the domains shown in the 
Table 82F. 



j Table 82 F. Domain Analysis of NOV82a 


i 

i 

Pfam Domain 


NOV82a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


Ribosomal_L29e 


3..42 


31/40 (78%) 
40/40(100%) 


2.3e-19 



10 Example 83. 

The NOV83 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 83A. 



WO 03/023002 




PCTYUS02/28539 



jTable 83A. NOV83 Sequence Analysis 


i 


SEQ ID NO: 289 (1409 bp I 


;NOV83a, 
•CG59536-01 DNA 

jSequence 

i 

i 

i 

1 

i 

i 

i 

i 

! 
! 

i 

1 

i 

i 
| 

i 
i 

! 


GGCACAGCCCAGGAACGTTGCTTTGGAGAATCCTGCAGATAAGGCTTTTCCAAAAAGCGCGAG 


CATCTTGTTGTATTCAGATACCCTATCGTCGTCAGTCATGGCTAGCATCACTGCGTGTGTGGG 


TAACAGCAGGCAGCAGAATGCACCTTTGCCGCCTTGGGCCCATTCCATGTTGAGGTCTCTGGG 
GAGGAGTCTCTGTCCTTTAGTGGTCAAAATGGCAGAGAGAAACATGAAGTTGTTCTCAGGAAG 
AGTGGTGCCAGCCCAGGGGAAAGAAACCTTTGAAAACTGGCTGATCCAAGTCAATGAGGTCCT 
GCCAGATTGGAGTATGTCTGAGGAGGAAAAACTCAAGCGCTTGATGAAAACACTTAGGGGCCC 
TGCCCGGGAGGTCATGCGTTTGCTTCAGGCGGCCAACCCCAACCTAAGTGTAGCAGATTTCTT 
GCGGGCAATGAAATTGGTGTTTGGGGAGTCTGAAAGCAGTGTGACTGCCCATGGTAAATTTTT 
lAAv.Av_i.lr lvjUA\jijUA\_AAVjvj{jVjAv*AAAt3L.U 1 1 1 Al Gl CjATCCGTTTAGAGGTGCAGCT 
CCAGAATGCTATTCAGGCAGGCATCCTAGCTGAGAAAGATGCAAACCAGACTCGCTTGCAACA 
GCTTCTTTTAGGCGCTGAGCTGAATAGGGACCTGCGCTTCAGGCTTAAGCATCTTCTCAGGAT 
GT ATG C AAAT AAG C AGG AG CGGCTTC CC AATTT CCTGG AGTT AAT CAAGA TG AT AAGGG AGG A 
AGAGGATTGGGATGATGCTTTTATTAAACGGAAGCGGCCGAAAAGGTCTGAGCCAATAATGGA 
GAGGGCAGCCAGCCCTGTGGCATTTCAGGGCGCCCAGCCAATAGCAATCAGCAGTGCTGACTG 
TAACTGCAACGTGATAGAAATAGATGATACCCTTGATGACTCTGATGAGGATGTGATCCTGGT 
GGTGTCTCTGTACCCTTCACTGACACCTACAGGTGCCCCTCCCTTCAGAGGAAGAGCCAGACC 
TCTGGATCAAGTGCTGGTTATTGATTCCCCCAACAATTCTGGGGCTCAGTCTCTTTCTACCAG 
TGGTGGTTCTGGGTATAAGAATGATGGTCCTGGGAATATTCGTAGAGCCAGGAAGCGaAAATA 
CACAACCCGCTGTTCATATTGTGGGGAGGAGGGCCACTCaAAAGAAACCTGTGACAATGAGAG 

caacaaggcccaggtttttgagaatctgatcatcaccctgcaggagctgacacatacagagga 
gaggtcaaaagaggtccctggagaacacagtgatgcttctgagccacagtaaggatctagtcc 
agccctaaatgagtccttgactgtattcagagtctggtaatgggaataacaggagaggggggt 




gggtttctaactgcatgaattaa 




ORF Start: ATG at 101 j joRF Stop: TAA at 1310 




SEQ ID NO: 290 |403 aa M W at 45 1 59.8kD 


;NOV83a, 

CG59536-0I Protein 
Sequence 

i 
i 
i 


masitacvgnsrqqnaplppwahsmlrslgrslcplwkmaernmklfsgrwpaqgketfen 
wliqvnevlpdwsmseeeklkrlmktlrgparevmrllqaanpnlsvadflramklvfgeses 
s vt ahg k f fntlqaqgekas l yv i rle vq lqna i qag i lae kdanqtrlqqll lg aelnrdlr 
frlkhllrmyankqerlpnflelikmireeedwddafikrkrpkrsepimeraaspvafqgaq 
piaissadcncnvieiddtlddsdedvilwslypsltptgappfrgrarpldqvlvidspnn 
sgaqslstsggsgykndgpgnirrarkrkyttrcsycgeeghsketcdnesnkaqvfenliit 
lqelthteerskevpgehsdasepq 


r • 


SEQ ID NO: 291 j 1360 bp 


1 


|NOV83b, 


GTTGCTTTGGAGAATCCTGCAGATAAGGCTTTTCCAAAAAGCGCGAGCATCTTGTTGTATTCA 


;CG59536-02 DNA 
^Sequence 

\ 

\ 


GATACCCTACCGTCGTCAGTCATGGCTAGCATCACTGCGCGTGTGGGTAACAGCAGGCAGCAG 
AATGCACCTTTGCCGCCTTGGGCCCATTCCATGTTGAGGTCTCTGGGGAGGAGTCTCTGTCCT 
TTAGTGGTCaaaATGGCAGAGAGAAACATGAAGTTGTTCTCAGGAAGAGTGGTGCCAGCCCAG 
GGGAAAGaAACCTTTGAAAACTGGCTGATCCAAGTCAATGAGGTCCTGCCAGATTGGAGTATG 
TCTGAGGAGGaAAAACTCAAGCGCTTGATGAAAACACTTAGGGGCCCTGCCCGGGAGGTCATG 
CGTTTGCTTCAGGCGGCCaACCCCAACCTAAGTGTAGCAGATTTCTTGCGGGCAATGAAATTG 
GTGTTTGGGGAGTCTGAaAGCAGTGTGACTGCCCATGGTAAATTTTTTAACACCCTGCAGGCA 

caaggggagaaagcctccctttatgtgatccgtttagaggtgcagctccagaatgctattcag 
gcaggcatcctagctgagaaaggtgcaaaccagactcgcttgcaacagcttcttttaggcgct 
gagctgaatagggacctgcgcttcaggcttaagcatcttctcaggatgtatgcaaataagcag 
gagcggcttcccaatttcctggagttaatcaagatgataagggaggaagaggattgggatgat 
gcttttattaaacggaagcggccgaaaaggtctgagccaataatggagagggcagccagccct 
gtggcatttcagggcgcccagccaatagcaatcagcagtgctgactgtaactgcaacgtgata 
gaaatagatgatacccttgatgactctgatgaggatgtgatcctggtggtgtctctgtaccct 
tcactgacacctacaggtgcccctcccttcagaggaagagccagacctctggatcaagtgctg 
gttattgattcccccaacaattctggggctcagtctctttctaccagtggtggttctgggtat 
aagaatgatggtcctgggaatattcgtagagccaggaagcgaaaatacacaacccgctgttca 
tattgtggggaggagggccactcaaaagaaacctgtgacaatgagagcaacaaggcccaggtt 
tttgagaatctgatcatcaccctgcaggagctgacacatacagaggagaggtcaaaagaggtc 
cctggagaacacagtgatgcttctgagccacagtaaggatctagtccagccctaaatgagtcc 
ttgactgtattcagagtctggtaatgggaataacagg 


! 

1 


ORF Start: ATG at 85 j 


ORF Stop: TAA at 1294 




SEQ ID NO: 292 403 aa MW at 45 1 54.9kD 


NOV83b, 

CG59536-02 Protein 


t^SITARVGNSRQQNAPLPPWAHSMLRSLGRSLCPLWKMAERNMKLFSGRVVPAOGKETFEN 
WLIQVNEVLPDWSMSEEEKLKRLMKTLRGPAREVMRLLQAANPNLSVADFLRAMKLVFGESES 
S VT AHG K F FNTLQ AQG EKASL YV I RL E VQLQN A I QAG I L AE KG ANQT RLQQLLLG AELN RD L R 
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Sequence 



FRLKHLLRMYANKQERLPNFLELIKMIREEEDWDDAFIKRKRPKRSEPIMERAASPVAFQGAQ j 
PIAISSADCNCNVIEIDDTLDDSDEDVILWSLYPSLTPTGAPPFRGRARPLDQVLVIDSPNN j 
SGAQSLSTSGGSGYKNDGPGNI RRARKRKYTTRCSYCGEEGHSKETCDNESNKAQVFENLI IT 
LQELTHTEERSKEVPGEHSDASEPQ ! 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 83B. 



Table 83B. Comparison of NOV83a against NOV83b. 


» 

Protein Sequence 


NOV83a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV83b 


1..403 
1..403 


384/403 (95%) 
384/403 (95%) 



5 



Further analysis of the NOV83a protein yielded the following properties shown in 
Table 83C. 



Table 83C. Protein Sequence Properties NOV83a 


PSort 
analysis: 


0.7000 probability located in nucleus; 0.1000 probability located in mitochondrial 
matrix space; 0.1000 probability located in lysosome (lumen); 0.0000 probability 
located in endoplasmic reticulum (membrane) 


SignalP 

analysis: 
) 


No Known Signal Sequence Predicted 



10 A search of the NOV83a protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 83D. 



Table 83D. Geneseq Results for NOV83a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date) 


NOV83a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAM51624 


KIAA0883-44 polypeptide - 
Unidentified, 403 aa. [WO200! 83540- 
AK08-NOV-200I] 


I..403 
I..403 


401/403(99%) 
401/403(99%) 


0.0 


AAB60478 


Human cell cycle and proliferation 
protein CCYPR-26, SEQ ID NO:26 - 
Homo sapiens, 402 aa. [WO200 107471- 
A2,01-FEB-2001] 


1..402 
1..401 


339/403 (84%) 
364/403 (90%) 


0.0 


AAM25693 








e-154 



400 



WO 03/023002 




PCT/US02/28539 



J 


MO: 1208 - Homo sapiens, 337 aa. 
[WO200153455-A2, 26-JUL-200I] 


I..337 


298/339 (87%) 




AAB 12528 


Human Ma4 protein SEQ ID NO:l 1 - 
Homo sapiens, 283 aa. [J P2000 146982- 
A, 26-MAY-2000] 


22..226 
62..260 


84/205 (40%) 
132/205(63%) 


7e-37 


AAB74701 


Human membrane associated protein 
MEMAP-7 - Homo sapiens, 353 aa. 
[WO2001 12662- A2, 22-FEB-2001] 


4..220 
1 19..332 


87/218(39%) 
127/218(57%) 


3e-36 



In a BLAST search of public sequence datbases, the NOV83a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 83E. 



Table 83E. Public BLASTP Results for NOV83a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV83a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities Tor the 
Matched Portion 


Expect 
Value 


Q8TE36 


CU46H1 1 .1 (novel protein) - Homo 
sapiens (Human), 403 aa. 


1..403 
1..403 


403/403(100%) 
403/403(100%) 


0.0 


AAH31241 


Similar to cU46H 11.1 (novel 
protein) - Homo sapiens (Human), 
402 aa. 


1 ..402 
1..401 


339/403 (84%) 
364/403 (90%) 


0.0 


Q9CZA5 


28 10028 AO 1 Rik protein - Mus 
musculus (Mouse), 402 aa. 


I..403 
1..402 


291/404(72%) 
332/404 (82%) 


e-161 


Q8VD24 


RIKENcDNA 1 50003 1 H04 gene - 
Mus musculus (Mouse), 393 aa. 


1..393 
1..392 


268/394 (68%) 
322/394(81%) 


e-151 


Q9DB17 


1 50003 1 H04Rik protein - Mus 
musculus (Mouse), 393 aa. 


I..393 
1 ..392 


267/394 (67%) 
322/394 (80%) 


e-150 



5 

PFam analysis predicts that the NOV83a protein contains the domains shown in the 
Table 83F. 



Table 83F. Domain Analysis of NOV83a 


Pfam Domain 


NOV83a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


zf-CCHC 


347..364 


6/18(33%) 
12/18(67%) 


0.059 



401 



WO 03/023002 




PCT/US02/28539 



Example 84. 

The NOV84 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 84A. 



Table 84A. NOV84 Sequence Analysis 




SEQ ID NO: 293 


3061 bp 


NOV84a, 
CG59794-01 DNA 
Sequence 

i 

! 
i 

i 

I 


GTTTGAACTAGATCTTTGGTGAGCACCCTCACATCTCTAAATGTGGGTTTCAGACTCTGAATT 


TTTGCCACTTTCTTATTGCTTCTCTTGCACAGGGATCATGGCCCAGGTAGCAGTGTCCACCCT 
GCCTGTTGAAGAAGAGTCCTCCTCAGAGACCAGGATGGTGGTGACATTCCTCGTGTCTGCCCT 
CGAATCCATGTGTAAAGAACTGGCCAAGTCCAAGGCAGAAGTGGCCTGCATCGCAGTGTACGA 
AACAGAC GTGTTTGTCGTCGG AACCG AG AGAGGATGCGCTTTTGTTAATG CC AGG ACGG ATT T 
TCAGAAAGATTTTGCAAAATACTGTAGGGCAGAGGGACTGTGTGAGGTGAAACCTCCCTGCCC 
TGTGAACGGGATGCAGGTCCACTCGGGCGAAACGGAAATACTCAGGAAGGCAGTGGAGGACTA 
TTTCTGCTTTTGTTATCAGAAAGCCTTAGGGACAACAGTGATGGTGCCTGTTCCCTATGAGAA 
GATGCTGCGAGACCAGTCGGCTGTGGTAGTGCAGGGGCTTCCGGAAGGCGTTGCCTTTCAACA 
CCCTGAGAATTACGACCTTGCAACCCTGAAATGGATTTTGGAGAACAAAGCAGGGATTTCATT 
CATCATAAATAGGAGACCCTTCCTAGGACCAGAGAGTCAGCTGGGTGGCCCTGGGATGGTAAC 
AGATGCGGAGAGATCCATAGTATCACCAAGTGAAAGCTGCGGCCCCATCAATGTGAAAACTGA 
ACCCATGGAAGATTCTGGCATTTCACTGAAAGCAGAAGCTGTCTCAGTCAAGAAAGAATCAGA 
AGATCCTAATTACTATCAATATAATATGCAAGGTAATAGCCACCCTTCTTCCACAAGCAATGA 
AGTAATAGAAATGGAATTACCAATGGAACAAGATTCCACTCCGCTGGTCCCTTCAGAP^AACC 
AAATGAGGACCCTGAAGCCGAGGTGAAAATCGAAGGTGGAAACACAAATTCATCCAGTGTTAC 
AAATTCTGCAGCAGGTGTTGAAGATCTTAACATCGTTCAAGTGACTGTTGATAATGAGAAGGA 
AAGATTATCAAGC ATTG AAAAG ATT AAAC AG CTAAGAG AACAAGTTAATGAC CTCTTT AG C CG 
AAAATTTGGTAAAGCAATTGGCGTGGATTTCCCTGTGAAAGTTCCCTACAGGAAGATCACATT 
CAACCCTGGCTGTGTGGTGATTGATGGCATGCCCCCGGGGGTGGTATTCAAGGCCCCCGGCTA 
TCTGGAAATCAGTTCCATGAGGAGGATCTTGGAGGCAGCTGAGTTTATCAAATTCACAGTCAT 
CAGGCCGCTTCCAGGGCTTGAGCTCAGTAATGTGGGAAAACGCAAGATAGACCAGGAGGGCCG 
TGTGTTTCAAGAAAAGTGGGAGAGAGCGTATTTCTTCGTGGAAGTACAGAATATTCCAACATG 
TCTCATATGCAAACAAAGCA IGICIG1GI Lv_ AAAGAA 1 Al AALL I AAGAL(jL.C AL 1 A I CAAAC 
f A ATP A C A HP A AH PA TT ATG AC C AGT AT ATG G AAAGAATGCG TG ACG AG AAG CTT CACGAG CT 
GAAAAAAGGGCTCAGGAAGTATCTCTTAGGCTTGTCAGACACCGAGTGTCCCGAGCAAAAACA 
AGTGTTTGCAAACCCAAGTCCAACCCAGAAATCCCCCGTGCAGCCTGTAGAGGACCTAGCTGG 
GAACTTATGGGAGAAGTTACGTGAAAAAATCAGGTCTTTTGTGGCATATTCTATCGCAATCGA 
TGAGATCACGGATATAAATAATACCACCCAGTTGGCCATATTCATCCGTGGTGTCGATGAGAA 
TTTCGATGTGTCCGAAGAACTTCTGGACACGGTGCCCATGACGGGTACAAAATCTGGCAACGA 
GATCTTTTCGCGTGTTGAGAAGAGCCTGAAAAAGTTCTGTATCGACTGGTCGAAATTAGTAAG 
CGTGGCCTCCACTGGCACCCCAGCGATGGTGGATGCCAATAACGGGCTTGTCACAAAACTGAA 
GTCCAGGGTGGCGACGTTCTGCAAGGGTGCGGAACTGAAGTCCATCTGTTGTATAATTCATCC 
GGAATCACTCTGTGCTCAGAAGTTGAAGATGGACCACGTCATGGACGTGGTAGTGAAGTCCGT 
GAACTGGATATGCTCCCGGGGACTGAACCACAGTGAGTTCACAACCTTGCTCTATGAGCTGGA 
CAGCCAGTATGGTAGCCTCCTGTACTACACGGAGATTAAGTGGCTCAGTCGCGGGCTCGTGCT 
AAAGAGATTTTTCGAATCCTTGGAAGAAATCGACTCCTTCATGTCATCCAGAGGGAAACCCCT 
GC CTC AACTG AG C TCC AT AG ATTGG ATC CG AGACCTGGC C TT CTTGGTTGACATG ACG ATG CA 
TCTGAACGCTTTGAACATCTCTCTCCAAGGACACTCCCAAATCGTCACGCAGATGTATGACCT 
GATCCGGGCGTTCCTAGCAAAACTGTGCCTCTGGGAGACTCATTTGACGAGGAATAATCTGGC 
CC ACTTT CCC ACC CTG AAATTGG CTTCC AG AAATGAAAG CG ATGG C CTG AACT AC ATTCC C AA 
AATCGCGG AACTC AAG AC CG AATTC C AG AAAAGG CTG TCTG ATTTC AAACTCT ACG AAAG CGA 
ACTGACTCTGTTCAGCTCCCCGTTCTCCACGAAGATCGACAGTGTGCACGAGGAGCTCCAGAT 
GGAGGTTATCGACCTGCAATGCAACACGGTCCTGAAGACGAAATACGACAAGGTGGGAATACC 
AG AATTCT AC AAG TAC CTC TGGGGT AGCT ACCCG AAATAC AAGCACC ATTGCG C AAAG ATT CT 
TTCC ATG TTCGGG AG C ACCT AC ATTTGCG AAC AG CTGTTCTCC ATT ATG AAACTG AGC AAAAC 
AAAATACTGCTCCCAGTTAAAGGATTCCCAGTGGGATTCTGTACTCCACATCGCAACGTGATG 
GAGAGAAAACTCCTGGCAGGGCCCTATGGTGGGAAAGGCTGGAGTCTTCTAGTCCCAAGGGAT 


i 

1 . ,. 


TGGG AG ATG AC AAAATG AATTTT TTTTT C T TTTT TGA 




ORF Start: ATG at 41 


jORF Stop: TGA at 2957 




SEQ ID NO: 294 ]972 aa MW at 109960 : 0kD 


NOV84a, 

CG59794-01 Protein 


MWVSDSEFLPLSYCFSCTGIMAQVAVSTLPVEEESSSETRMVVTFLVSALESMCKELAKSKAE 
VACIAVYETDVFWGTERGCAFVNARTDFQKDFAKYCRAEGLCEVKPPCPVNGMQVHSGETEI 
LRKAVEDYFCFCYQKALGTTVMVPVPYEKMLRDQSAVVVQGLPEGVAFQHPENyDLATLKWIL 



402 



WO 03/023002 




PCT/US02/28539 



Sequence 


ENKAGISFIINRRPFLGPESQLGGPGMVTDAERSIVSPSESCGPINVKTEPMEDSGISLKAEA 
VSVKKESEDPNYYQYNMQGNSHPSSTSNEVIEMELPMEQDSTPLVPSEEPNEDPEAEVKIEGG 
NTNSSSVTNSAAGVEDLNIVQVTVDNEKERLSSIEKIKQLREQVNDLFSRKFGKAIGVDFPVK 
VPYRKITFNPGCWIDGMPPGWFKAPGYLEISSMRRILEAAEFIKFTVIRPLPGLELSNVGK 
RKIDQEGRVFQEKWERAYFFVEVQNIPTCLICKQSMSVSKEYNLRRHYQTNHSKHYDQYMERM 
RDEKLHELKKGLRKYLLGLSDTECPEQKQVFANPSPTQKSPVQPVEDLAGNLWEKLREKIRSF 
VAYSIAIDEITDINNTTQLAIFIRGVDENFDVSEELLDTVPMTGTKSGNEIFSRVEKSLKKFC 
IDWSKLVSVASTGTPAMVDANNGLVTKLKSRVATFCKGAELKSICCIIHPESLCAQKLKMDHV 
MDWVKSVNWICSRGLNHSEFTTLLYELDSQYGSLLYYTEIKWLSRGLVLKRFFESLEEIDSF 
MSSRGKPLPQLSSIDWIRDIAFLVDMTMHLNALNISLQGHSQIVTQMYDLIRAFLAKLCLWET 
HLTRNNLAHFPTLKLASRNESDGLNYIPKIAELKTEFQKRLSDFKLYESELTLFSSPFSTKID 
SVHEELQMEVIDLQCNTVLKTKYDKVGIPEFYKYLWGSYPKYKHHCAKILSMFGSTYICEQLF 
SIMKLSKTKYCSQLKDSQWDSVLHIAT 




SEQ ID NO: 295 


2031 bp 1 


NOV84b, 
CG59794-02 DNA 
Sequence 


CAGACCTGCGGCGACAGAGCAAAACTCCGTCTCAACAACAACGACAACAAAAATTCAGTCTTC 


AGGTTTTCTTTAGAAAACTTGAAGATCTGGCCACAGCTGGCATCCTGGCAGCGGTTTGCTGGA 


GTTGAGGGTCAGCCGTCCCTCTGCAGGGTGGGTCACCCTCCTGTTAACCACGCCCTGCCCCGC 


CCCGCTTCCTCCCTCTCGTGCGTCATCAAGCATTTGCTGTTGTTTTCCTCATAGTAGTGATAA 


GAGAAAAGTGAAATATCTTTGTCTCCCTGTCTCTGTCAAAAGTGGGAAAACGCAAGATAGACC 


AGGAGGGCCGTGTGTTTCAAGAAAAGTGGGAGAGAGCGTATTTCTTCGTGGAAGTACAGAATA 


TTCCAACATGTCTCATATGCAAACAAAGCATGTCTGTGTCCAAAGAATATAACCTAAGACGCC 


ACTATCAAACCAATCACAGCAAGCATTATGACCAGTATATGGAAAGAATGCGTGACGAGAAGC 
TTCACGAGCTGAAAAAAGGGCTCAGGAAGTATCTCTTAGGCTTGTCAGACACCGAGTGTCCCG 
AGCAAAAACAAGTGTTTGCAAACCCAAGTCCAACCCAGAAATCCCCCGTGCAGCCTGTAGAGG 
ACCTAGCTGGGAACTTATGGGAGAAGTTACGTGAAAAAATCAGGTCTTTTGTGGCATATTCTA 
TCGCAATCGATGAGATCACGGATATAAATAATACCACCCAGTTGGCCATATTCATCCGTGGTG 
TCGATGAGAATTTCGATGTGTCCGAAGAACTTCTGGACACGGTGCCCATGACGGGTACAAAAT 
CTGGCAACGAGATCTTTTCGCGTGTTGAGAAGAGCCTGAAAAAGTTCTGTATCGACTGGTCGA 
AATTAGTAAGCGTGGCCTCCACTGGCACCCCAGCGATGGTGGATGCCAATAACGGGCTTGTCA 
CAAAACTGAAGTCCAGGGTGGCGACGTTCTGCAAGGGTGCGGAACTGAAGTCCATCTGTTGTA 
TAATTCATCCGGAATCACTCTGTGCTCAGAAGTTGAAGATGGACCACGTCATGGACGTGGTAG 
TG AAGT CCGTG AACTGG AT ATG CTC C CGGGG AC TG AACC ACAGTG AGTTC ACAACCTTGCT CT 
ATGAGCTGGACAGCCAGTATGGTAGCCTCCTGTACTACACGGAGATTAAGTGGCTCAGTCGCG 
GGCTCGTGCTAAAGAGATTTTTCGAATCCTTGGAAGAAATCGACTCCTTCATGTCATCCAGAG 
GGAAACCCCTGCCTCAACTGAGCTCCATAGATTGGATCCGAGACCTGGCCTTCTTGGTTGACA 
TGACGATGCATCTGAACGCTTTGAACATCTCTCTCCAAGGACACTCCCAAATCGTCACGCAGA 
TGTATGACCTGATCCGGGCGTTCCTAGCAAAACTGTGCCTCTGGGAGACTCATTTGACGAGGA 
ATAATCTGGCCCACTTTCCCACCCTGAAATTGGCTTCCAGAAATGAAAGCGATGGCCTGAACT 
AC AT T CC CAAAAT CGCGGAACT C AAG AC CG AATTCC AG AAAAGGCTGTCTG AT TTC AAAC TC T 
ACGAAAGCGAACTGACTCTGTTCAGCTCCCCGTTCTCCACGAAGATCGACAGTGTGCACGAGG 
AGCTCCAGATGGAGGTTATCGACCTGCAATGCAACACGGTCCTGAAGACGAAATACGACAAGG 
TGGGAATACCAGAATTCTACAAGTACCTCTGGGGTAGCTACCCGAAATACAAGCACCATTGCG 
CAAAGATTCTTTCCATGTTCGGGAGCACCTACATCTGCGAACAGCTGTTCTCCATTATGAAAC 
TGAGCAAAACAAAATACTGCTCCCAGTTAAAGGATTCCCAGTGGGATTCTGTACTCCACATCG 
CAACGTGATGGAGAGAAAACTCCTGGCAGGGCCCTATGGTGGGAAAGGCTGGAGTCTTCTAGT 


CCCAAGGGATTGGGAGATGACAAAATGAATTTTTTTTTCTTTTTTGAGATGGAGTCTTGCTCT 


GTCGCCGCAGGTCTG 




ORF Start: ATG at 408 


|ORF Stop. TGA at 1896 




SEQ ID NO: 296 j496 aa M W at 57221 .3kD 


NOV84b, 

!CG59794-02 Protein 

■Sequence 

1 

1 


MSVSKEYNLRRHYQTNHSKHYDQYMERMRDEKLHELKKGLRKYLLGLSDTECPEQKQVFANPS 
PTQKSPVQPVEDLAGNLWEKLREKIRSFVAYSIAIDEITDINNTTQLAIFIRGVDENFDVSEE 
LLDT V PMTGT KSGNEIFS R VE KSLKKFCIDWS K L VS V AS TGT P AM VD ANNGL VT KL KS R V AT F 
C KG AE L KSICCIIHPES LCAQ KL KM D HVM D VW K S VNW ICSRGLNHSE FTTLL Y ELDS Q YGS L 
LYYTEIKWLSRGLVLKRFFESLEEIDSFMSSRGKPLPQLSSIDWIRDLAFLVDMTMHLNALNI 
SLQGHSQIVTQMYDLIRAFLAKLCLWETHLTRNNLAHFPTLKLASRNESDGLNYIPKIAELKT 
EFQKRLSDFKLYESELTLFSSPFSTKI DS VHEELQMEVI DLQCNTVLKTKYDKVGI PEFYKYL 
WGSYPKYKHHCAKILSMFGSTYICEQLFSIMKLSKTKYCSQLKDSQWDSVLHIAT 




SEQ ID NO: 297 


3123 bp | 


NOV84C, 


CGAGTGGCGAGCAGGGGCCTCGGCCGCCACCCACACGCCCCGAAGCGTGCTCGTCCCCCGCGC 


CG59794-03 DNA 


GGGGCTCCCGGCCGCCGCCCTCGGCCATCGGCTGCTCCCCGGTGGCCCAGGCCTCGGACTCCG 


CGGCCGGCCCGGCGCGGCCCAGCGCCCTCAGGGATCATGGCCCAGGTAGCAGTGTCCACCCTG 


Sequence 


CCTGTTGAAGAAGAGTCCTCCTCAGAGACCAGGATGGTGGTGACATTCCTCGTGTCTGCCCTC 
GAATCCATGTGTAAAGAACTGGCCAAGTCCAAGGCAGAAGTGGCCTGCATCGCAGTGTACGAA 



403 



03/023002 



# 



PCT/US02/28539 



ACAGACGTGTTTGTCGTCGGAACCGAGAGAGGATGCGCTTTTGTTAATGCCAGGACGGATTTT 
CAGAAAGATTTTGCAAAATACTGCGTTGCAGAGGGACTGTGTGAGGTGAAACCTCCCTGCCCT 
GTGAACGGGATGCAGGTCCACTCGGGCGAAACGGAAATACTCAGGAAGGCAGTGGAGGACTAT 
TTCTGCTTTTGTTATCAGAAAGCCTTAGGGACAACAGTGATGGTGCCTGTTCCCTATGAGAAG 
ATGCTGCGAGACCAGTCGGCTGTGGTAGTGCAGGGGCTTCCGGAAGGCGTTGCCTTTCAACAC 
CCTGAGAATTACGACCTTGCAACCCTGAAATGGATTTTGGAGAACAAAGCAGGGATTTCATTC 
ATCATAAATAGGAGACCCTTCCTAGGACCAGAGAGTCAGCTGGGTGGCCCTGGGATGGTAACA 
GATGCGGAGAGATCCATAGTATCACCAAGTGAAAGCTGCGGCCCCATCAATGTGAAAACTGAA 
CCCATGGAAGATTCTGGCATTTCACTGAAAGCAGAAGCTGTCTCAGTCAAGAAAGAATCAGAA 
GATCCTAATTACTATCAATATAATATGCAAGGTAATAGCCACCCTTCTTCCACAAGCAATGAA 
GTAATAGAAATGGAATTACCAATGGAACAAGATTCCACTCCGCTGGTCCCTTCAGAAGAACCA 
AATGAGGACCCTGAAGCCGAGGTGAAAATCGAAGGTGGAAACACAAATTCATCCAGTGTTACA 
AATTCTGCAGCAGGTGTTGAAGATCTTAACATCGTTCAAGTGACTGTTGATAATGAGAAGGAA 
AGATTATCAAGCATTGAAAAGATTAAACAGCTAAGAGAACAAGTTAATGACCTCTTTAGCCGA 
AAATTTGGTAAAGCAATTGGCGTGGATTTCCCTGTGAAAGTTCCCTACAGGAAGATCACATTC 
AACCCTGGCTGTGTGGTGATTGATGGCATGCCCCCGGGGGTGGTATTCAAGGCCCCCGGCTAT 
CTGGAAATCAGTTCCATGAGGAGGATCTTGGAGGCAGCTGAGTTTATCAAATTCACAGTCATC 
AGGCCGCTTCCAGGGCTTGAGCTCAGTAATGTGGGAAAACGCAAGATAGACCAGGAGGGCCGT 
GTGTTTCAAGAAAAGTGGGAGAGAGCGTATTTCTTCGTGGAAGTACAGAATATTCCAACATGT 
CTCATATGCAAACAAAGCATGTCTGTGTCCAAAGAATATAACCTAAGACGCCACTATCAAACC 
AATCACAGCAAGCATTATGACCAGTATATGGAAAGAATGCGTGACGAGAAGCTTCACGAGCTG 
AAAAAAGGGCTCAGGAAGTATCTCTTAGGCTTGTCAGACACCGAGTGTCCCGAGCAAAAACAA 
GTGTTTGCAAACCCAAGTCCAACCCAGAAATCCCCCGTGCAGCCTGTAGAGGACCTAGCTGGG 
AACTTATGGGAGAAGTTACGTGAAAAAATCAGGTCTTTTGTGGCATATTCTATCGCAATCGAT 
GAGATCACGGATATAAATAATACCACCCAGTTGGCCATATTCATCCGTGGTGTCGATGAGAAT 
TT CG ATGTGTC CG AAGAACTTCTGG AC ACGGTGCCC ATG ACGGGT AC AAAAT C TGGC AACG AG 
ATCTTTTCGCGTGTTGAGAAGAGCCTGAAAAAGTTCTGTATCGACTGGTCGAAATTAGTAAGC 
GTGGCCTCCACTGGCACCCCAGCGATGGTGGATGCCAATAACGGGCTTGTCACAAAACTGAAG 
TCCAGGGTGGCGACGTTCTGCAAGGGTGCGGAACTGAAGTCCATCTGTTGTATAATTCATCCG 
GAATCACTCTGTGCTCAGAAGTTGAAGATGGACCACGTCATGGACGTGGTAGTGAAGTCCGTG 
AACTGG AT ATGCT CC CGG GG ACTG AAC C AC AGTG AGTTC AC AACCTTG CTCT ATG AGC TGG AC 
AGCCAGTATGGTAGCCTCCTGTACTACACGGAGATTAAGTGGCTCAGTCGCGGGCTCGTGCTA 
AAGAGATTTTTCGAATCCTTGGAAGAAATCGACTCCTTCATGTCATCCAGAGGGAAACCCCTG 
CCTCAACTGAGCTCCATAGATTGGATCCGAGACCTGGCCTTCTTGGTTGACATGACGATGCAT 
CTGAACGCTTTGAACATCTCTCTCCAAGGACACTCCCAAATCGTCACGCAGATGTATGACCTG 
ATCCGGGCGTTCCTAGCAAAACTGTGCCTCTGGGAGACTCATTTGACGAGGAATAATCTGGCC 
CACTTTCCCACCCTGAAATTGGCTTCCAGAAATGAAAGCGATGGCCTGAACTACATTCCCAAA 
ATCGCGGAACTCAAGACCGAATTCCAGAAAAGGCTGTCTGATTTCAAACTCTACGAAAGCGAA 
CTGACTCTGTTCAGCTCCCCGTTCTCCACGAAGATCGACAGTGTGCACGAGGAGCTCCAGATG 
GAGGTT ATCG ACCTG C AAT GC AAC ACGG TC C TG AAG ACG AAAT AC G AC AAGGTGGG AAT AC C A 
GAATTCTACAAGTACCTCTGGGGTAGCTACCCGAAATACAAGCACCATTGCGCAAAGATTCTT 
TCCATGTTCGGGAGCACCTACATTTGCGAACAGCTGTTCTCCATTATGAAACTGAGCAAAACA 
AAATACTGCTCCCAGTTAAAGGATTCCCAGTGGGATTCTGTACTCCACATCGCAACGTGATGG 
AGAGAAAACTCCTGGCAGGGCCCTATGGTGGGAAAGGCTGGAGTCTTCTAGTCCCAAGGGATT 



jNOV84c, 

|CG59794-03 Protein 
:Sequence 



GGGAGATGACAAAATGAATTTTTTTTTCTTTTTTGA 



ORF Start: ATG at 163 



ORF Stop:TGA at 3019 



SEQ ID NO: 298 



!952 



aa 



iMWat 107635.4kD 



MAQVAVSTLPVEEESSSETRMWTFLVSALESMCKELAKSKAEVACIAVYETDVFVVGTERGC 
AFVNARTDFQKDFAKYCVAEGLCEVKPPCPVNGMQVHSGETEILRKAVEDYFCFCYQKALGTT 
VMVPVPYEKMLRDQSAWVQGLPEGVAFQHPENYDLATLKWILENKAGISFIINRRPFLGPES 
QLGGPGMVTDAERSIVSPSESCGPINVKTEPMEDSGISLKAEAVSVKKESEDPNYYQYNMQGN 
SHPSSTSNEVIEMELPMEQDSTPLVPSEEPNEDPEAEVKIEGGNTNSSSVTNSAAGVEDLNIV 
QVTVDNEKERLSSIEKIKQLREQVNDLFSRKFGKAIGVDFPVKVPYRKITFNPGCWIDGMPP 
GWFKAPGYLEISSMRRILEAAEFIKFTVIRPLPGLELSNVGKRKIDQEGRVFQEKWERAYFF 
VEVQNIPTCLICKQSMSVSKEYNLRRHYQTNHSKHYDQYMERMRDEKLHELKKGLRKYLLGLS 
DTECPEQKQVFANPSPTQKSPVQPVEDLAGNLWEKLREKIRSFVAYSIAIDEITDINNTTQLA 
IFIRGVDENFDVSEELLDTVPMTGTKSGNEIFSRVEKSLKKFCIDWSKLVSVASTGTPAMVDA 
NNGLVTKLKSRVATFCKGAELKSICCIIHPESLCAQKLKMDHVMDWVKSVNWICSRGLNHSE 
FTTLLYELDSQYGSLLYYTEIKWLSRGLVLKRFFESLEEIDSFMSSRGKPLPQLSSIDWIRDL 
AFLVDMTMHLNAI^ISLQGHSQIVTQMYDLIRAFLAKLCLWETHLTRNNIAHFPTLKLASRNE 
SDGLNYIPKIAELKTEFQKRLSDFKLYESELTLFSSPFSTKIDSVHEELQMEVIDLQCNTVLK 
TKYDKVGIPEFYKYLWGSYPKYKHHCAKILSMFGSTYICEQLFSIMKLSKTKYCSQLKDSQWD 
SVLHIAT 



404 



WO 03/023002 




PCT/US02/28539 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 84B. 



Table 84B. Comparison of NOV84a against NOV84b and NOV84c. 


Protein Sequence 


NOV84a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV84b 


477..972 
1..496 


484/496 (97%) 
484/496 (97%) 


NOV84c 


21. .972 
1..952 


923/952 (96%) 
923/952 (96%) 



5 Further analysis of the NOV84a protein yielded the following properties shown in 

Table 84C. 



Tabic 84C Protein Sequence Properties NOV84a 


PSort 
analysis: 


0.3600 probability located in mitochondrial matrix space; 0.3000 probability located 
in microbody (peroxisome); 0.1000 probability located in lysosome (lumen); 0.1000 
probability located in nucleus 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV84a protein against the Geneseq database, a proprietary database 
10 that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 84D. 



Table 84D. Geneseq Results for NOV84a 


Geneseq 
Identifier 


Protein/Organism/Length |Patent #, 
Date] 


NOV84a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


ABG22337 


Novel human diagnostic protein #22328 
- Homo sapiens, 1 205 aa. 
[WO200I75067-A2, 1 l-OCT-2001] 


19..972 
3 15.. 1205 


864/961 (89%) 
871/961 (89%) 


0.0 


AAM79866 


Human protein SEQ ID NO 3512 - 
Homo sapiens, 663 aa. [WO200I57I90- 
A2.09-AUG-200I] 


31 6..972 
1..663 


650/663 (98%) 
653/663 (98%) 


0.0 


ABB11827 


Human transcription factor 21-like 
protein homologue, SEQ ID NO:2197 - 
Homo sapiens, 663 aa. [WO200I57188- 
A2.09-AUG-200I] 


316..972 
1..663 


650/663 (98%) 
653/663 (98%) 


0.0 
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AAM78882 

i 
i 


Human protein SEQ ID NO 1544 - 
Homo sapiens, 582 aa. [WO200157I90- 
A2,09-AUG-200I] 


396..972 
I..582 


575/582 (98%) 
575/582 (98%) 


0.0 


AAB42755 


Human ORFX ORF25 1 9 polypeptide 
sequence SEQ ID NO:5038 - Homo 
sapiens, 533 aa. [ WO200058473-A2, 
05-OCT-2000] 


440. .972 
I. .533 


531/533 (99%) 
531/533 (99%) 


0.0 


In a BLAST search of public sequence datbases, the NOV84a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 84E. 


Table 84E. Public BLASTP Results for NOV84a 


Protein 
! Accession 
Number 


Protcin/Organism/Length s 


NOV84a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q99NI3 


Gtf2ird2 - Mus musculus (Mouse), 936 
aa. 


21. .972 
1 ..936 


749/953 (78%) 
836/953 (87%) 


0.0 


BAC04576 


CDNA FLJ38253 lis, clone 
FCBBF3000768, moderately similar to 
Homo sapiens general transcription 
factor 2-1 (GTF2I) mRNA - Homo 
sapiens (Human), 702 aa. 


269..972 = 
I..702 


693/705 (98%) 
697/705 (98%) 


0.0 


CAD38861 


Hypothetical protein - Homo sapiens 
(Human), 496 aa. 


477..972 ; 
1..496 


495/496(99%) , 
495/496 (99%) 


0.0 


CAD38788 


Hypothetical protein - Homo sapiens 
(Human), 397 aa (fragment). 


579..972 
4..397 


394/394 (100%) 
394/394(100%) 


0.0 


Q9H739 


CDNA: FLJ2I423 fis, clone COL04129 
- Homo sapiens (Human), 364 aa. 


609..972 
1..364 


360/364 (98%) 
362/364 (98%) 


0.0 



5 

PFam analysis predicts that the NOV84a protein contains the domains shown in the 
Table 84F. 



j Table 84F. Domain Analysis of NOV84a 


i 

Pfam Domain 


NOV84a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


j GTF2I 

i 


127..203 


38/77 (49%) 
71/77 (92%) 


l.le-39 


. GTF2I 

1 

i ... 


355..430 


38/76 (50%) 
72/76 (95%) 


1.2e-41 
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Example 85. 

The NOV85 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 85A. 



Table 85A. NOV85 Sequence Analysis 




SEQIDNO:299 


530 bp 


r 


NOV85a, 
CG59821-01 DNA 
Sequence 


CAGGATGAACGCTGCTTTCCAAGATGGCGACGGAGGGAGGAGGGAAGGAGATGAACGAGATTA 
AGACCCAATTCACCACCCGGGAAGGTCTGTACAAGCTGCTGCCGCACTCGGAGTACAGCCGGC 
CC AAC CGGGTGC C C TTC AACTCG CAG GG ATC C AACC CTGT CCG CGTCTCCTTCGT AAAC CT C A 
AC G ACCAGTCTGG C AACGG CG ACCGC CT CTG CTTC AATG TGGGCCGGG AGCTGT ACTTCT AT A 
TC T AC AAGGGGGTCCG C AAGG CTGCTGACTTG AGT AAAC CAAT AGATAAAAGG AT AT ACAAAG 
GAACACAGCCTACTTGTCATGACTTCAACCACCTAACAGCCACAGCAGAAAGTGTCTCTCTCC 
TAGTGGG CTTTTCCG CAGGCC AAGT C CAGCTT AT AGAC C CAATCAAAAAAGAAACTAGCAAAC 
TTTTTAATGAGGAAGGCTCATTGTCATCCCCAAGCCAGGCCAGTTCTCCAGGTGG7VACTGTAG 
TGTAGCGACCTCACTGCTGCGCGCAC 




ORF Start: ATG at 24 


jORF Stop: TAG at 507 




SEQ ID NO: 300 


161 aa jMWat 17692.7kD 


NOV85a, 

CG59821-01 Protein 
Sequence 


MATEGGGKEMNE I KTQFTTREGL YKLLPHSE YSRPNRVP FNSQGSNP VRVS FVNLNDQSGNGD 
RLCFNVGRELYFYIYKGVRKAADLSKPIDKRIYKGTQPTCHDFNHLTATAESVSLLVGFSAGQ 
VQLIDPIKKETSKLFNEEGSLSSPSQASSPGGTW 




SEQIDNO:301 j519 bp 




NOV85b, 
CG59821-02 DNA 
Sequence 


AGGATGAACGCTGCTTTCCAAGATGGCGACGGAGGGAGGAGGGAAGGAGATGAACGAGATTAA 
GACCCAATTCACCACCCGGGAAGGTCTGTACAAGCTGCTGCCGCACTCGGAGTACAGCCGGCC 
CAACCGGGTGCCCTTCAACTCGCAGGGATCCAACCCTGTCCGCGTCTCCTTCGTAAACCTCAA 
CGACCAGTCTGGCAACGGCGACCGCCTCTGCTTCAATGTGGGCCGGGAGCTGTACTTCTATAT 
CTACAAGGGGGTCCGCAAGGCTGCTGACTTGAGTAAACCAATAGATAAAAGGATATACAAAGG 
AACACAGCCTACTTGTCATGACTTCAACCACCTAACAGCCACAGCAGAAAGTGTCTCTCTCCT 
AGTGGGCTTTTCCGCAGGCCAAGTCCAGCTTATAGACCCAATCAAAAAAGAAACTAGCAAACT 
TTTTAATGAGGAAGGCTCATTGTCATCCCCAAGCCAGGCCAGTTCTCCAGGTGGAACTGTAGT 
GTAGCGACCTCACTG 




ORF Start: ATG at23 


|ORF Stop: TAG at 506 j 




SEQ ID NO: 302 


161 aa iMWat !7692.7kD 


NOV85b, 

CG5982 1-02 Protein 
Sequence 


MATEGGGKEMNEIKTQFTTREGLYKLLPHSEYSRPNRVPFNSQGSNPVRVSFVNLNDQSGNGD 
RLCFNVGRELYFYIYKGVRKAADLSKPIDKRIYKGTQPTCHDFNHLTATAESVSLLVGFSAGQ 
VQLIDPIKKETSKLFNEEGSLSSPSQASSPGGTW 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 85B. 



Tabic 85B. Comparison of NOV85a against NOV85b. 


Protein Sequence 


NOV85a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV85b 


1..I6I 
K.161 


147/161 (91%) 
147/161 (91%) 



Further analysis of the NOV85a protein yielded the following properties shown in 
Table 85C. 
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| Table 85C. Protein Sequence Properties NOV85a 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



A search of the NOV85a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
5 homologous proteins shown in Table 85D. 



Table 85D. Geneseq Results for NOV85a 


Geneseq 
Identifier 


Protein/Organism/Length 
[Patent #, Date| 


NOV85a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect Value 

> 


AAB68534 


Human GTP-binding associated 
protein #34 - Homo sapiens, 161 
aa. [WO200105970-A2, 25-JAN- 
2001] 


I ..161 
I..161 


161/161 (100%) 
161/161 (100%) 


3e-90 


(ABG14369 


Novel human diagnostic protein 
# 1 4360 - Homo sapiens, 87 aa. 
[WO200I75067-A2, ll-OCT- 
2001] 


I0..90 
I..80 


77/81 (95%) 
78/81 (96%) 

| 


5e-39 


AAM79336 


Human protein SEQ ID NO 2982 

- Homo sapiens, 687 aa. 

[ WO200 1571 90-A2, 09-A UG- 

2001] 


12.. 144 
21. .221 

1 

1 


89/201 (44%) 
104/201 (51%) 


2e-31 


AAM78352 


Human protein SEQ ID NO 1014 
- Homo sapiens, 684 aa. 
[WO200I57190-A2, 09-AUG- 
2001] 


12.144 
21..221 


89/201 (44%) 
104/201 (51%) 


2e-31 


ABG06239 

t 

i 

| 


Novel human diagnostic protein 
#6230 - Homo sapiens. 580 aa. 
[WO200I75067-A2, 1 1-OCT- 
2001] 


22..83 
428..489 


62/62(100%) 
62/62(100%) 


2e-3l 



In a BLAST search of public sequence datbases, the NOV85a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 85E. 
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Table 85E. Public BLASTP Results for NOV85a 



i 

; Protein 
! Accession 
\ Number 


Protein/Organism/Length 


NOV85a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


iQ9D721 


23 10040A13Rik protein - Mus j I..I6I 
musculus (Mouse), 161 aa. i 1..16I 


156/161 (96%) 
158/161 (97%) 


4e-86 


!AAH30654 

! 

\ 


Hypothetical 22.2 kDa protein - 
Homo sapiens (Human), 1 95 aa. 


I..I46 

1 ..146 


145/146(99%) 
145/146 (99%) 


le-80 


I Q8TBZ3 

i 
i 


Similar to putative - Homo sapiens 
(Human), 569 aa. 


1..144 
1..I44 


144/144(100%) 
144/144(100%) 


3e-80 


AAH27497 

| 


Similar to RIKEN cDNA 
2310040A13 gene - Mus musculus 
(Mouse), 145 aa. 


1..144 
1..144 


142/144(98%) 
142/144(98%) 


6e-78 


Q8R0J5 


Similar to putative - Mus musculus 
(Mouse), 539 aa. 


1..144 
1 ..144 


142/144 (98%) 
142/144 (98%) 


6e-78 



PFam analysis predicts that the NOV85a protein contains the domains shown in the 
Table 85F. 



i Table 8SF. Domain Analysis of NOV85a 



5 



Pfam Domain 


NOV85a Match Region 

. 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 




Example 86. 







The NOV86 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 86A. 



jTable 86A. NOV86 Sequence Analysis 


i 


SEQ ID NO: 303 


6064 bp 




jNOV86a, 
,0059849-01 DNA 
'Sequence 

! 

i 

1 


ATQACCACCAAACGGAAAATCATCGGCCGTCTGGTGCCATGCCGATGTTTCCGAGGTGAAGAA 
GAAATCATCTCAGTTTTAGATTACTCCCACTGCAGTCTTCAGCAGGTGCCAAAGGAGGTCTTT 
AACTTCGAACGAACATTAGAGGAGCTTTATCTAGATGCCAATCAAATTGAAGAACTACCCAAG 
CAATTGTTCAACTGTCAAGCTCTACGAAAACTAAGTATTCCTGATAACGACCTTTCAAATCTG 
CCAACCACTATTGCTAGTTTAGTTAATCTTAAAGAACTCGACATCAGTAAAAATGGTGTACAA 
GAATTTCCAGAAAACATAAAGTGCTGTAAGTGTTTAACAATTATTGAAGCCAGTGTCAATCCC 
ATTTCTAAGCTACCTGATGGCTTCACACAGCTCCTAAACCTGACCCAGCTCTACCTGAATGAC 
GCCTTTCTTGAATTTCTTCCAGCCAATTTTGGAAGGCTTGTCAAATTGCGGATCTTGGAGTTA 
AGAGAAAATCACTTGAAAACTCTACCAAAGATGCACAAACTGGCCCAGTTGGAAAGACTTGAC 
CTAGGCAATAATGAATTCAGTGAGCTGCCTGAAGTTCTGGATCAAATACAAAATTTGAGGGAG 
TTATGGATGGATAATAATGCATTACAAGTGTTACCTGGGTCTATAGGGAAGTTAAAGATGTTG 
GTATACCTGGATATGTCAAAAAACAGAATAGAAACAGTTGACATGGACATTTCTGGATGTGAA 
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• 


GCCCTTGAGGACCTCTTATTGTCATCCAATATGTTGCAACAATTGCCTGATTCTATAGGTGGA 

CTTTTGAAAAAACTAACAACTCTAAAAGTAGATGACAATCAACTTACAATGCTACCCAATACA 

ATTGG AAGTTT AT CTTT ATT AG AAG AAT T TG ACTGT AGCTGT AATG AACTGG AG T C AC T AC CT 

TCTACTATTGGCTACCTTCATAGTCTTCGGACATTAGCAGTTGATGAGAATTTCCTTCCAGAA 

TTACCCAGAGAAATTGGAAGTTGTAAGAATGTAACAGTCATGTCTCTACGCTCCAACAAATTA 

GAATTTCTTCCTGAAGAGATTGGACAGATGCAGAAACTAAGAGTCCTAAATTTGAGTGACAAC 

AGGTTGAAGAATTTACCATTCTCATTTACCAAACTTAAAGAGCTTGCAGCTTTGTGGCTTTCT 

GACAATCAGTCCAAAGCCCTTATCCCTTTACAAACAGAAGCCCATCCAGAAACAAAGCAAAGA 

GTATTGACTAACTACATGTTTCCCCAGCAGCCTCGTGGTGATGAAGATTTCCAGTCAGACAGT 

GACAGCTTTAACCCTACACTGTGGGAAGAGCAGAGACAACAACGCATGACTGTTGCCTTTGAA 

TTTGAAGACAAAAAAGAAGATGACGAAAATGCTGGGAAAGTTAAGCTCTCCTGCCAAGCCCCC 

TGGGAAAGGGGCCAGCGTGGGATTACTCTCCAACCTGCCAGACTGTCTGGCGATTGCTGCACA 

C CATGGG CC AGGTGTGAT C AGCAGAT CCAAG AT ATG C CCGTCCCCCAG AATG ACC C AC AGCTG 

GCATGGGGTTGTATAAGTGGCCTCCAGCAGGAAAGGAGCATGTGTACTCCATTGCCAGTTGCA 

GC AC AAT CC ACC ACT CTT C CCT CTCT AAGTGGCAGAC AGGTTG AAAT AAAC CT AAAACG AT AT 

CCAACTCCTTACCCAGAGGATTTAAAGAATATGGTAAAATCTGTTCAAAATTTGGTGGGTAAG 

CCAAGCCATGGAGTGCGTGTTGAGAATTCAAATCCAACTGCTAACACGGAGCAAACTGTGAAA 

G AAAAAT ATG AACAC AAGTGGCC GGT AGC CCC AAAGG AG ATT ACAGTGGAGG ATTCTTT TGTT 

CATCCAGCTAATGAAATGAGGATTGGGGAACTTCACCCTTCATTAGCTGAGACCCCTCTGTAC 

CCACCCAAACTTGTTCTGCTAGGGAAGGACAAAAAAGAATCAACTGATGAGTCTGAAGTTGAC 

AAAACTCACTGTCTGAATAACAGTGTTTCCTCAGGCACTTACTCAGACTACTCGCCTTCCCAG 

GCTTCCTCAGGATCCTCTAATACCCGGGTTAAAGTGGGGTCCTTGCAGACAACAGCTAAAGAT 

GCAGTACATAATTCTTTGTGGGGTAACAGGATTGCACCATCTTTCCCACAGCCTCTTGATTCA 

AAGCCATTACTCAGCCAGCGGGAGGCTGTTCCCCCAGGCAATATACCACAGCGTCCTGACCGG 

CTGCCCATGAGTGATACTTTCACTGACAACTGGACTGATGGCTCGCATTATGACAACACAGGG 

TTTGTTG CTG AGG AAACC ACAG C CG AG AATG C CAAC AGT AATCCTCTCTT AAGTTCG AAAT CT 

AGAAGCACATCTTCGCATGGACGCAGGCCTTTGATCAGGCAAGACAGGATTGTTGGTGTTCCC 

CTGGAACTCGAGCAGTCTACACACAGACACACACCAGAAACAGAAGTGCCTCCTTCCAATCCT 

TGGCAGAATTGGACCAGAACCCCTAGTCCGTTTGAAGACAGGACCGCTTTTCCTTCCAAATTA 

GAGACAACCCCCACTACCAGCCCATTGCCTGAAAGGAAAGAACATAT AAAGG AATCTACTGAA 

ATACCTAGTCCTTTTTCTCCAGGCGTACCATGGGAGTATCATGATTCCAATCCCAACAGGAGT 

CTTAGTAATGTCTTTTCTCAAATCCATTGCCGCCCGGAATCTTCTAAAGGTGTTATTTCAATT 

AGCAAAAGCACAGAGAGGCTTTCCCCCCTAATGAAAGATATCAAGTCTAATAAATTCAAAAAG 

TCACAGAGTATCGATGAGATTGACATTGGTACATATAAGGTGTATAACATACCATTAGAAAAC 

TATGCTTCTGGGAGTGATCACTTAGGAAGCCACGAACGACCGGATAAGATGCTGGGACCAGAG 

CATGGTATGTCCAGTATGTCTCGAAGCCAGTCAGTCCCAATGCTGGATGATGAGATGCTCACC 

TACGGAAGTAGTAAGGGGCCACAACAACAAAAAGCTTCTATGACAAAAAAAGTCTATCAGTTT 

GACCAAAGCTTCAATCCTCAAGGATCAGTGGAAGTGAAAGCCGAAAAGAGGATACCACCCCCT 

TTTCAACACAATCCCGAGTACGTGCAACAGGCCAGCAAAAACATCGCCAAGGATTTGATTAGT 

CCTAGAGCTTACAGAGGATACCCACCGATGGAGCAAATGTTTTCATTTTCTCAGCCATCTGTG 

AATG AGG ATGCTGTGGTG AATG C CCAGTTCG C AAGCC AAGGGGC C AG GG CGGG CTT CCTG AG A 

AGGGCCGACTCCCTGGTGAGCGCCACAGAAATGGCCATGTTTAGAAGGGTCAATGAGCCTCAT 

GAGCTGCCCCCAACTGATAGGTACGGCAGACCCCCATATAGGGGAGGGCTGGATCGCCAAAGC 

AGCGTTACAGTGACTGAGTCCCAGTTCCTGAAAAGGAATGGCAGGTATGAAGATGAACACCCT 

TCATATCAAGAAGTGAAAGCTCAGGCGGGAAGTTTTCCGGTTAAAAACCTTACCCAAAGGAGG 

CCATTGTCTGCGAGAAGCTACAGTACAGAGAGTTACGGTGCCTCCCAAACCAGGCCAGTTTCA 

GCTAGGCCTACTATGGCAGCTCTTTTGGAAAAAATACCATCTGACTATAACTTGGGTAACTAT 

GGTGACAAGCCATCAGATAACAGTGATTTAAAGACGAGGCCTACTCCTGTGAAGGGAGAGGAG 

AGCTGTGGT AAAATG CCTGCAGACTGGAGAC AACAGCTG CTTAGACATATAGAAGCTAGACGG 

TTAGACAGGACCCCGTCCCAGCAAAGCAACATTTTAGACAATGGACAAGAAGATGTATCTCCT 

AGTGG CC AATGG AAT CCTTATC C ACTTGGG AGGCGGG ATGT ACCT CCGG ACACC AT T ACT AAG 

AAGGC AGG C AG CC AC AT CC AG ACGTTG ATGGGGTCCC AAAG C CTTC AGC ATCG C AG CCGGG AG 

CAGCAGCCGTATGAAGGAAATATAAACAAAGTGACCATCCAGCAATTTCAGTCACCATTGCCT 

ATTCAGATCCCCTCTTCACAGGCCACCCGGGGACCTCAGCCTGGACGGTGCTTAATTCAAACT 

AAAGGGCAAAGGAGTATGGATGGATATCCAGAGCAGTTTTGTGTGAGAATAGAAAAGAATCCT 

GGCCTTGGATTTAGTATCAGTGGTGGAATTAGTGGACAAGGAAATCCATTCAAACCTTCTGAC 

AAGGGTATCTTTGTTACTAGGGTTCAGCCTGATGGGCCAGCATCAAACCTACTGCAGCCTGGT 

G AT AAG ATCCT TC AGG C AAATGG AC AC AGTT TTGT ACAT ATGG AAC ATG AAAAAG CTG TATT A 

CTACTGAAGAGTTTCCAGAACACAGTAGACCTAGTTATTCAACGTGAGCTTACTGTCTAAATA 

TTTTTTATAAATAGTGAAGATACGTCTAGCCAGACCTAATGTTCAAAAATAAATTTATACATA 


GAAACAAATTTTGCCAATTGCTGGACCAATGGCAAACATTAGTGCCAAATGTATAATACTATA 


TGTTAGCACTGACCATCCTTAAAAAATGTTAACTCTATAAATATGATGTTCATGTGGTTATGT 


ATT AGTTTT AATTGT CAG CCT CTGG CTGTGCATTGGTGCAGTTTTGTTTCTGTTTT TGTTTTT 




GTTTTTAATCAAATAAGTTTCTTCTCAAAATGGATTTCATATAATTTCGGAGCACGGAAGCAC 




ACACAAGCTCTTTATGAATTCTGCTCTCCATCAGAAACACTGCCTCAAAGTTGTATATGCCTT 


TATATAGAAAATACAAATATAAAGAATTGTAATTCCCATAAAATATTTCTAGCACAAGGTATA 
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j 
s 

I 

i 


TGTTGGCATATATACAAAAAGAATATAGAGAAAAACAATATTTTCATAAACTAAACATCTCAG 


ATAGAGAAAAAATATATCTTAAAATAAGACTTTACTATATTGAATCTTTTTCAATAAAAATTA 


CATGATAATGCCTTATGAAAGTAACTGTACATATGGTATAAAGTGTTTATATTTGGTTCCATA 


TTCATTTGCTAAATTCTCATGACACAGAGTGAAATATTTCATAAATTAGCCATTTATCTCTGG 


GACCCAAATAAAAATAGGATGAACTAATTTGTTCAATGCCTTTAGCTAATTACAATACATGCA 


GAGTTTAGAAACAGACTAAAGGTCATTGTAGTTAAGTCTTTTTCACCACAAATTTAAGCAGTG 


GATGATGGGTGGCAGGAAAGGTATTGCTTTATTTCTTTCAAGTTCATGTTGATTATAAACTGT 


AGCCCCTGTGATTTCTTTACTTGTAAATGTGGAATTTATTTGTGTGTTGCTTAATCTAATTTG 


CTGCTTTTTAAATTATTTAAAACGAATTTTGGAAATTGATAAAATTTATCATTACGAAAGACT 


GCTGTTAGAAAGTTATGGTAGGTGATTTTAAATCCTTGGTATTTAAATATGAAACTTCAAATA 


TAATTTCTCAGAGCTGTGGTCTACCTGTATCATTAATTTCAATGGCTGTTTTTCTGGGCAGAA 


ATAGATAAAATACTTTTTTTCCAAAAACAGTTTCAAGGTATGTAAAATCCTGAATGCTTTTTC 


ACTGAAGAGAAAGACAAGCATGGTTAATGTAGAATTATTTACTTTTCCATTGAAACTATTTTC 


CTGCATAAATGATCAAAATTTATTTTATAATCCTTTAAAATACTTATCTTTCATATTAGTCAT 


TAATTTAATTACAATATTAATTTGAATTTCCAGGATAATTTCCCGGAGTTGGTTGCATGCATT 


ATCTTTCATAATTTTACATAGTTCTTTTGTTATATAATGAATTTACTTTACATGCTAGTGTTT 


CAAGT AT TGT ATG AGGATTTT C AC AAT AGT ATCACTG AATG ATGT C AC C AG AG CTCTG AG AAT 


AATATTTGTAAGTTAACTGTTTTATGGGGACATTGAAAATATTGTATTTTTGTAGGGTCTATT 


AAAATGAGTGTCACTT 




ORF Start: ATG at 1 j 1 


ORF Stop: TAA at 4468 , 




SEQ1DNO:304 |l489aa MW at 16724 1.9kD \ 


|NOV86a, 

|CG59849-01 Protein 
Sequence 

! 
i 

i 
t 

i 

! 

; 


MTTKRKIIGRLVPCRCFRGEEEI ISVLDYSHCSLQQVPKEVFNFERTLEELYLDANQIEELPK 
QLFNCQALRKLSIPDNDLSNLPTTIASLVNLKELDISKNGVQEFPENIKCCKCLTIIEASVNP 
ISKLPDGFTQLLNLTQLYLNDAFLEFLPANFGRLVKLRILELRENHLKTLPKMHKLAQLERLD 
LGNNEFSELPEVLDQIQNLRELWMDNNALQVLPGSIGKLKMLVYLDMSKNRIETVDMDISGCE 
ALE DLLL S SNMLQQL P DS I GG LL K KLTT L K VDDNQLTML PNT IGSLSLLEEFDCS CNE L E S L P 
STIGYLHSLRTLAVDENFLPELPREIGSCKNVTVMSLRSNKLEFLPEEIGQMQKLRVLNLSDN 
RLKNLPFSFTKLKELAALWLSDNQSKALIPLQTEAHPETKQRVLTNYMFPQQPRGDEDFQSDS 
DSFNPTLWEEQRQQRMTVAFEFEDKKEDDENAGKVKLSCQAPWERGQRGITLQPARLSGDCCT 
PWARCDQQIQDMPVPQNDPQLAWGCISGLQQERSMCTPLPVAAQSTTLPSLSGRQVEINLKRY 
PTPYPEDLKNMVKSVQNLVGKPSHGVRVENSNPTANTEQTVKEKYEHKWPVAPKEITVEDSFV 
HPANEMRIGELHPSLAETPLYPPKLVLLGKDKKESTDESEVDKTHCLNNSVSSGTYSDYSPSQ 
AS SGS SNTRV K VGS LQTT AKD A VHNS LWGNR IAPSFPQPLDSKPLLS QREAV P PGN I PQ R PDR 
LPMSDTFTDNWTDGSHYDNTGFVAEETTAENANSNPLLSSKSRSTSSHGRRPLIRQDRIVGVP 
LELEQSTHRHTPETEVPPSNPWQNWTRTPSPFEDRTAFPSKLETTPTTSPLPERKEHIKESTE 
IPSPFSPGVPWEYHDSNPNRSLSNVFSQIHCRPESSKGVISISKSTERLSPLMKDIKSNKFKK 
S QS I DE I D I GT YK VYN IPLENYASGS DHLG S HE R PD KM LG P E HGM S S M S RSQS VPM LDDEMLT 
YGSSKGPQQQKASMTKKVYQFDQSFNPQGSVEVKAEKRIPPPFQHNPEYVQQASKNIAKDLIS 
PRAYRGYPPMEQMFSFSQPSVNEDAVVNAQFASQGARAGFLRRADSLVSATEMAMFRRVNEPH 
ELPPTDRYGRPPYRGGLDRQSSVTVTESQFLKRNGRYEDEHPSYQEVKAQAGSFPVKNLTQRR 
PLSARSYSTESYGASQTRPVSARPTMAALLEKIPSDYNLGNYGDKPSDNSDLKTRPTPVKGEE 
SCGKMPADWRQQLLRHIEARRLDRTPSQQSNILDNGQEDVSPSGQWNPYPLGRRDVPPDTITK 
KAGSHIQTLMGSQSLQHRSREQQPYEGNINKVTIQQFQSPLPIQIPSSQATRGPQPGRCLIQT 
KGQRSMDGYPEQFCVRIEKNPGLGFSISGGISGQGNPFKPSDKGIFVTRVQPDGPASNLLQPG 
DKILQANGHSFVHMEHEKAVLLLKSFQNTVDLVIQRELTV 




SEQIDNO:305 1260 bp 




: NOV86b, 
jCG59849-02 DNA 
;Sequence 

i 

. 

1 


ATGACCACCAAACGGAAAATCATCGGCCGTCTGGTGCCATGCCGATGTTTCCGAGGTGAAGAA 
GAAATCATCGCAGTTTTAGATTACTCCCACTGCAGTCTTCAGCAGGTGCCAAAGGAGGTCTTT 
AA CTTCG AACG AACAT TAG AGGAG CTTT ATCT AG ATGCC AAT C AAATTG AAG AACTAC CCAAG 
CAATTGTTCAACTGTCAAGCTCTACGAAAACTAAGTATTCCTGATAACGACCTTTCAAATCTG 
CCAACCACTATTGCTAGTTTAGTTAATCTTAAAGAACTCGACATCAGTAAAAATGGTGTACAA 
GAATTTCCAGAAAACATAAAGTGCTGTAAGTGTTTAACAATTATTGAAGCCAGTGTCAATCCC 
ATTTCTAAACTACCTGATGGCTTCACACAGCTCCTAAACCTGACCCAGCTCTACCTGAATGAC 
GC CTTTCT TG AATTTCTT CCAGCC AATTT TGG AAG ACTTGT C AAATTG CGG AT CT TGG AGTTA 
AGAGAAAATCACTTGAAAACTCTACCAAAGTCAATGCACAAACTGGCCCAGTTGGAAAGACTT 
G A CCT AGG C AAT AATG AATTCGGTGAGC TGCCTGAAGTTC TG G ATCAAAT AC AAAATT TG AGG 
GAGTTATGGATGGATAATAATGCATTACAAGTGTTACCTGGGGCAGGCAGCCACATCCAGACG 
TTG ATGGGGT CC C AAAGC CTTC AGC ATCG C AG C CGGG AGCAG C AGCCGT ATG AAGG AAAT AT A 
AACAAAGTGACCATCCAGCAATTTCAGTCACCATTGCCTATTCAGATCCCCTCTTCACAGGCC 
ACCCGGGGACCTCAGCCTGGACGGTGCTTAATTCAAAATAAAGGGCAAAGGAGTATGGATGGA 
TATCCAGAGCAGTTTTGTGTGAGAATAGAAAAGAATCCTGGCCTTGGATTTAGTATCAGTGGT 
GGAATTAGTGGACAAGGAAATCCATTCAAACCTTCTGACAAGGGTATCTTTGTTACTAGGGTT 
CAGCCTGATGGGCCAGCATCAAACCTACTGCAGCCTGGTGATAAGATCCTTCAGGCAAATGGA 
CACAGTTTTGTACATATGGAACATGAAAAAGCTGTATTACTACCGAAGAGTTTCCAGAACACA 
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GT AG AC CT AGTT ATTCAACG T G AGCT T ACTGTCT AAAT ATTT TTT AT AAATAG TG AAG AT ACG 
TCTAGCCAGACCTAATGTTCAAAAATAAATTTATACATAGAAACAAATTTTGCCAATTGCTGG 


ORF Start: ATG at 1 ] 


ORF Stop: TAAat 1168 


jSEQ ID NO: 306 |389 aa jMW at 43792.9kD 


NOV86b, 

CG59849-02 Protein 
Sequence 


MTTKRKIIGRLVPCRCFRGEEEIIAVLDYSHCSLQQVPKEVFNFERTLEELYLDANQIEELPK 
QLFNCQALRKLSIPDNDLSNLPTTIASLVNLKELDISKNGVQEFPENIKCCKCLTI IEASVNP 
I S KLPDGFTQLLNLTQLYLNDAFLEFLPANFGRLVKLR I LELRENHLKTLPKSMHKLAQLERL 
DLGNNEFGELPEVLDQIQNLRELWMDNNALQVLPGAGSHIQTLMGSQSLQHRSREQQPYEGNI 
NKVTIQQFQSPLPIQIPSSQATRGPQPGRCLIQNKGQRSMDGYPEQFCVRIEKNPGLGFSISG 
GISGQGNPFKPSDKGIFVTRVQPDGPASNLLQPGDKILQANGHSFVHMEHEKAVLLPKSFQNT 
VDLVIQRELTV 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 86B. 



Table 86B. Comparison of NOV86a against NOV86b. 


„ . . c ! NOV86a Residues/ 
Protein Sequence • . 

^ i Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV86b ] 1..232 
| 1..233 


210/233 (90%) 
215/233 (92%) 



5 

Further analysis of the NOV86a protein yielded the following properties shown in 
Table 86C. 



Table 86C. Protein Sequence Properties NOV86a 


PSort 
analysis: 


0.5192 probability located in mitochondrial matrix space; 0.3000 probability located 
in microbody (peroxisome); 0.2487 probability located in mitochondrial inner 
membrane; 0.2487 probability located in mitochondrial intermembrane space 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



10 A search of the NOV86a protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 86D. 
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Table 86D. Gcncseq Results for NOV86a 


Geneseq 
Identifier 


Protein/Organism/Length (Patent 
#,Date| 


NOV86a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Rccion 


Expect 
Value 


AAM52529 


Human Erbin mutein #5 - Homo 
sapiens. 1371 aa. [FR2807437-A1, 
12-OCT-2001] 


I .1488 
1 ..1370 


566/1557(36%) 
790/1557(50%) 


0.0 


AAM52528 


Human Erbin mutein #4 - Homo 
sapiens, 1371 aa. [FR2807437-A1, 
I2-OCT-2001] 


1..1488 
I..I370 


565/1557 (36%) 
788/1557(50%) 


0.0 


AAM52530 


Human Erbin mutein #6 - Homo 
sapiens, 1371 aa. [FR2807437-A 1 , 
12-OCT-2001] 


1 ..1488 
1..1370 


565/1557(36%) 
787/1557(50%) 


0.0 


AAM52527 


Human Erbin mutein #3 - Homo 
sapiens, 1371 aa. [FR2807437-A 1 , 
I2-OCT-2001] 


1..1488 
1.1370 


566/1557 (36%) 
788/1557 (50%) 


0.0 


AAM52526 


Human Erbin mutein #2 - Homo 
sapiens, 1419 aa. [FR2807437-A 1 , 
I2-OCT-2001] 


1..1488 
1..1418 


568/1579 (35%) 
793/1579 (49%) 


0.0 



In a BLAST search of public sequence datbases, the NOV86a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 86E. 



Tabic 86E. Public BLASTP Results for NOV86a 








Protein 

Accession 

Number 


1 

Protein/Organism/Length 


N0V86a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


Q96NW7 


Densin-1 80 - Homo sapiens 
(Human), 1537 aa. 


1..1489 
I..1537 


1486/1538(96%) 
1487/1538 (96%) 


0.0 


P70587 


Densin-180 - Rattus norvegicus 
(Rat), 1495 aa. 


1 ..1489 
6.. 1495 


1421/1491 (95%) 
1454/1491 (97%) 




0.0 


Q9P2I2 


KIAA 1365 protein - Homo sapiens 
(Human), 83 1 aa (fragment). 


659.. 1489 
1..831 


829/831 (99%) 
830/831 (99%) 


0.0 


Q96RT1 


Densin-180-like protein - Homo 
sapiens (Human), 1412 aa. 


1..1488 
1..14U 


573/1562(36%) 
804/1562(50%) 


0.0 


Q9NR18 


Erbb2-interacting protein ERBIN - 
Homo sapiens (Human), 1371 aa. 


1..1488 
1..1370 


567/1557 (36%) 
789/1557 (50%) 


0.0 



5 

PFam analysis predicts that the NOV86a protein contains the domains shown in the 
Table 86F. 
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; Table 86F. Domain Analysis of NOV86a 



! Ffam Domain 

i 


NOV86a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


! LRR 

! 
i 


47..69 


9/25 (36%) 
19/25 (76%) 


0.13 


LRR 

i 

i 


93..I15 


8/25 (32%) 
19/25 (76%) 


0.83 


LRR 


I84..206 


10/25 (40%) 
19/25 (76%) 


0.048 


LRR 


207..229 


12/25(48%) 
19/25 (76%) 


0.041 


LRR 


25S..275 


12/25 (48%) 
18/25 (72%) 


0.71 


| LRR 

! 


277..299 


8/25 (32%) 
21/25 (84%) 


0.13 


;lrr 


369..39I 


1 1/25 (44%) 
20/25 (80%) 


0.00084 


pdz 


1400.. 1486 


34/93 (37%) 
74/93 (80%) 


8.5e-19 



Example 87. 

The NOV87 clone was analyzed, and the nucleotide and encoded polypeptide 
5 sequences are shown in Table 87A. 



Table 87 A. NOV87 Sequence Analysis 




SEQIDNO: 307 j 2062 bp | 


!NOV87a, 
jCG59920-Ol DNA 
jSequence 

i 
t 

i 

! 

1 

i 

i 

t 


GAGGGGACGTCGTCGTAGAGGGCCGGAGCGGGCGGGCGGCGACGGACCCGGCTCCCGCGCAGG 


ACGGAGCCGTGGCTCAGGTCGGCCCCTCCCCAACACCACCCCGGGCCTCCGCCCCTTCCTGGG 


CCT CT CGGTGG AG C AGGG ACCCG AAC CGGTG CC C ATCC AGTC CGG TG CC ATCTGAAGCCCC CT 


TCCCAGAAAATGAGCCACAGAGCAAGCTGACCCCAGCGACACAGCCCCCCAGCCCTACTATAT 


TTCCGTTCCTATCAAAAAATGGATGACTCGGAGACAGGTTTCAATCTGAAAGTCGTCCTGGTC 
AGTTTCAAGCAGTGTCTCGATGAGAAGGAAGAGGTCTTGCTGGACCCCTACATTGCCAGCTGG 
AAGGGCCTGGTCAGGTTTCTGAACAGCCTGGGCACCATCTTCTCATTCATCTCCAAGGACGTG 
GTCTCCAAGCTGCGGATCATGGAGCGCCTCAGGGGCGGCCCGCAGAGCGAGCACTACCGCAGC 
CTGCAGGCCATGGTGGCCCACGAGCTGAGCAACCGGCTGGTGGACCTGGAGGGCCGCTCCCAC 
C ACCCGG AGT CTG G CTGC CGG ACGGTGCTGC GCCTGC ACCGCGCCCTGC ACTGG CTG CAGCTG 
TTCCTGGAGGGCCTGCGTACCAGCCCCGAGGACGCACGCACCTCCGCACTCTGCGCCGACTCC 
TACAACGCCTCGCTGGCCGCCTACCACCCCTGGGTCGTGCGCCGTGCCGTCACCGTGGCCTTC 
TGCACGCTGCCCACACGCGAGGTCTTCCTGGAGGCCATGAACGTGGGGCCCCCGGAGCAGGCC 
GTGCAGATGCTAGGCGAGGCCCTCCCCTTCATCCAGCGTGTCTACAACGTCTCCCAGAAGCTC 
TACGCCGAGCACTCCCTGCTGGACCTGCCCTAGAGGCGGGAAGCCAGGGCCGCACCGGCTTTC 


CTGC TGC AG ATCTGGG CTG CG GTGG C CAGGG CCGTGAGTCCCGTGGC AG AGCCTTCTGGGCG C 


TGCGGGAACAGGAGATCCTCTGTCGCCCCTGTGAGCTGAGCTGGTTAGGAACCACAGACTGTG 


ACAGAGAAGGTGGCGACCAGCCCAGAAGAGGCCCACCCTCTCGGTCCGGAACAAGACGCCTCA 


GC CACGGCTCCCCCTCGGCCTATTACACGCGTGCGCAGCCAGGCCTCGCCAGGGTGCGGTGCA 
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i 

i 
< 

i 


GAGCAGAGCAGGCAGGGGTGGGGGCCGGGCCCGCAAGAGCCCGAAAGGTCGCCACCCCCTAGC 


CTGTGGGGTGCATCTGCGAACCAGGGTGAAGTCACAGGTCCCGGGGTGTGGAGGCTCCATCCT 


TTCTCCTTTCTGCCAGCCGATGTGTCCTCATCTCAGGCCCGTGCCTGGGACCCCGTGTCTGCC 


C AGGTGGG C AGCCTTG AG C CC AGGGG ACTC AGTG CCCTCCATGCC CTGGCTGG C AG AAACC CT 


CAACAGCAGTCTGGGCACTGTGGGGCTCTCCCCGCCTCTCCTGCCTTGTTTGCCCCTCAGCGT 


GC CAGG C AG ACTGGGGGC AGG AC AGCCGG AAG CTG AG AC C AAGGC TCCTC AC AG AAGGG CCC A 


GGAAGTCCCCGCCCTTGGGACAGCCTCCTCCGTAGCCCCTGCACGGCACCAGTTCCCCGAGGG 


ACGCAGCAGGCCGCCTCCCGCAGCGGCCGTGGGTCTGCACAGCCCAGCCCAGCCCAAGGCCCC 


CAGGAGCTGGGACTCTGCTACACCCAGTGAAATGCTGTGTCCCTTCTCCCCCGTGCCCCTTGA 


TGCCCCCTCCCCACAGTGCTCAGGAGACCCGTGGGGCACGGAACAGGAGGGTCTGGACCCTGT 


GGCCCAGCCAAAGGCTACCAGACAGCCACAACCAGCCCAGCCACCATCCAGTGCCTGGGGCCT 


GGCCACTGGCTCTTCACAGTGGACCCCAGCACCTCGGGGTGGCAGAGGGACGGCCCCCACGGC 


CCAGCAGACATGCGAGCTTCCAGAGTGCAATCTATGTGATGTCTTCCAACGTTAATAAATCAC 


ACAGCCTCCCAGGAGGGAGACGCTGGGGTGCAAAAAAAAAGCAAAA 




ORF Start: ATG at 271 J [ORF Stop: TAG at 9 1 3 




SEQ ID NO: 308 ^214 aa ]MW at 24265.6kD 


NOV87a, 

CG59920-01 Protein 
(Sequence 


MDDSETGFNLKWLVSFKQCLDEKEEVLLDPYIASWKGLVRFLNSLGTIFSFISKDWSKLRI 
MERLRGGPQSEHYRSLQAMVAHELSNRLVDLEGRSHHPESGCRTVLRLHRALHWLQLFLEGLR 
TSPEDART5ALCADSYNASLAAYHPWWRRAVTVAFCTLPTREVFLEAMNVGPPEQAVQMLGE 
ALPFIQRVYNVSQKLYAEHSLLDLP 



NOV87b, 
CG59920-02 DNA 
Sequence 



SEQ ID NO: 309 



723 bp 



ACAGCCCCCCAGCCCTACTGTATTTCCGTTCCTATCAAAAAATGGATGACTCGGAGACAGGTT 



TCAATCTGAAAGTCGTCCTGGTCAGTTTCAAGCAGTGTCTCGATGAGAAGGAAGAGGTCTTGC 
TGG AC CCCT AC AT TGC C AG CTGG AAGGGC CTGGTC AGGTTTCTG AAC AGC CTGGG C AC CAT C T 
TCTC ATT CAT CTCC AAGGACGTGGTC TCC AAG CT GCGG AT C ATGG AG CG CCTC AGGGG CGG CC 
CGCAGAGCGAGCACTACCGCAGCCTGCAGGCCATGGTGGCCCACGAGCTGAGCAACCGGCTGG 
TGGACCTGGAGCGCCGCTCCCACCACCCGGAGTCTGGCTGCCGGACGGTGCTGCGCCTGCACC 
G CGCC CTGCACTGGCTG C AGCTGTTC CTGG AGGG CCTGCGT ACCAG C CCCG AGG ACGC ACG C A 
CCTCCGCGCTCTGCGCCGACTCCTACAACGCCTCGCTGGCCGCCTACCACCCCTGGGTCGTGC 
GCCGTGCCGTCACCGTGGCCTTCTGCACGCTGCCCACACGCGAGGTCTTCCTGGAGGCCATGA 
ACGTGGGGCCCCCGGAGCAGGCCGTGCAGATGCTAGGCGAGGCCCTCCCCTTCATCCAGCGTG 
TCTACAACGTCTCCCAGAAGCTCTACGCCGAGCACTCCCTGCTGGACCTGCCCTAG AGGCGG G 
AAGCCAGGGCCGCACCGGCTTTCCTGCTGC 



ORF Start: ATG at 42 



ORF Stop: TAG at 684 



SEQ ID NO: 310 



214 aa 



MWat 24364.8kD 



NOV87b, 

CG59920-02 Protein 
Sequence 



MDDSETGFNLKWLVSFKQCLDEKEEVLLDPYIASWKGLVRFLNSLGTIFSFISKDWSKLRI 
MERLRGGPQSEHYRSLQAMVAHELSNRLVDLERRSHHPESGCRTVLRLHRALHWLQLFLEGLR 
TSPEDARTSALCADSYNASLAAYHPWWRRAVTVAFCTLPTREVFLEAMNVGPPEQAVQMLGE 
AL P F I QR VYNVSQKL YAEHS LLDLP 



;NOV87c, 
(277583351 DNA 
:Sequence 



jSEQ ID NO: 311 



]664 bp 



CACCGGATCCACCATGGATGACTCGGAGACAGGTTTCAATCTGAAAGTCGTCCTGGTCAGTTTC 
AAGCAGTGTCTCGATGAGAAGGAAGAGGTCTTGCTGGACCCCTACATTGCCAGCTGGAAGGGCC 
TGGTCAGGTTTCTGAACAGCCTGGGCACCATCTTCTCATTCATCTCCAAGGACGTGGTCTCCAA 
GCTG CGG AT C ATGG AG CGC CTC AGGGGCGGCCCGC AG AG CG AG C ACT AC CGC AGCC TGC AGGCC 
ATGGTGGCCCACGAGCTGAGCAACCGGCTGGTGGACCTGGAGCGCCGCTCCCACCACCCGGAGT 
jCTGGCTGCCGGACGGTGCTGCGCCTGCACCGCGCCCTGCACTGGCTGCAGCTGTTCCTGGAGGG 
CCTGCGTACCAGCCCCGAGGACGCACGCACCTCCGCGCTCTGCGCCGACTCCTACAACGCCTCG 
CTGGCCGCCTACCACCCCTGGGTCGTGCGCCGTGCCGTCACCGTGGCCTTCTGCACGCTGCCCA 
CACGCGAGGTCTTCCTGGAGGCCATGAACGTGGGGCCCCCGGAGCAGGCCGTGCAGATGCTAGG 
CG AGG CC C TCCCCT TC ATC C AGCGTGTCT AC AACGTCTCC CAG AAGCTCT ACG CCG AGC AC TCC 
CTGCTGGACCTGCCCCTCGAGGGC 



iORF Start: at 2 



jORF Stop: end of sequence 



SEQ IDNO:312 



221 aa 



MWat 250 !0.4kD 



|NOV87c, 
277583351 Protein 
Sequence 



TGSTMDDSETGFNLKWLVSFKQCLDEKEEVLLDPYIASWKGLVRFLNSLGTIFSFISKDVVS 
KLRIMERLRGGPQSEHYRSLQAMVAHELSNRLVDLERRSHHPESGCRTVLRLHRALHWLQLFL 
E GL RTS P ED ARTS ALC ADS YNAS LAAYH P WWRRA VT VA FCTL PT R E VFL E AMNVG P P EQ A VQ 
MLGEALPFIQRVYNVSQKLYAEHSLLDLPLEG 
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| SEQ IDNO: 313 ]2062 bp | 


|NOV87d. 
CG59920-01 DNA 
Sequence 


GAGGGGACGTCGTCGTAGAGGGCCGGAGCGGGCGGGCGGCGACGGACCCGGCTCCCGCGCAGG 


ACGG AGC CGTGGC TC AGGTCGGCCCCT C CC C AAC ACC AC CCC GGG CCTC C G CCC CTTCCTGGG 


CCTCTCGGTGGAGCAGGGACCCGAACCGGTGCCCATCCAGTCCGGTGCCATCTGAAGCCCCCT 


TCCCAGAAAATGAGCCACAGAGCAAGCTGACCCCAGCGACACAGCCCCCCAGCCCTACTATAT 


TTCCGTTCCTATCAAAAAATGGATGACTCGGAGACAGGTTTCAATCTGAAAGTCGTCCTGGTC 
AGTTTCAAGCAGTGTCTCGATGAGAAGGAAGAGGTCTTGCTGGACCCCTACATTGCCAGCTGG 
AAGGGCCTGGTCAGGTTTCTGAACAGCCTGGGCACCATCTTCTCATTCATCTCCAAGGACGTG 
GTCTCCAAGCTGCGGATCATGGAGCGCCTCAGGGGCGGCCCGCAGAGCGAGCACTACCGCAGC 
CTGCAGGCCATGGTGGCCCACGAGCTGAGCAACCGGCTGGTGGACCTGGAGGGCCGCTCCCAC 
CACCCGGAGTCTGGCTGCCGGACGGTGCTGCGCCTGCACCGCGCCCTGCACTGGCTGCAGCTG 
TTCCTGGAGGGCCTGCGTACCAGCCCCGAGGACGCACGCACCTCCGCACTCTGCGCCGACTCC 
TACAACGCCTCGCTGGCCGCCTACCACCCCTGGGTCGTGCGCCGTGCCGTCACCGTGGCCTTC 
TGCACGCTGCCCACACGCGAGGTCTTCCTGGAGGCCATGAACGTGGGGCCCCCGGAGCAGGCC 
GTGCAGATGCTAGGCGAGGCCCTCCCCTTCATCCAGCGTGTCTACAACGTCTCCCAGAAGCTC 
TACGCCGAGCACTCCCTGCTGGACCTGCCCTAGAGGCGGGAAGCCAGGGCCGCACCGGCTTTC 


CTGCTGCAGATCTGGGCTGCGGTGGCCAGGGCCGTGAGTCCCGTGGCAGAGCCTTCTGGGCGC 


TGCGGGAACAGGAGATCCTCTGTCGCCCCTGTGAGCTGAGCTGGTTAGGAACCACAGACTGTG 


ACAGAGAAGGTGGCGACCAGCCCAGAAGAGGCCCACCCTCTCGGTCCGGAACAAGACGCCTCA 


GCCACGGCTCCCCCTCGGCCTATTACACGCGTGCGCAGCCAGGCCTCGCCAGGGTGCGGTGCA 


GAGCAGAGCAGGCAGGGGTGGGGGCCGGGCCCGCAAGAGCCCGAAAGGTCGCCACCCCCTAGC 


CTGTGGGGTGCATCTGCGAACCAGGGTGAAGTCACAGGTCCCGGGGTGTGGAGGCTCCATCCT 


IIC TCCl TTCTGCCAGCCGATGTGTCCTCATCTCAGGCCCGTGCCTGGGACCCCGTGT 


CAGGTGGGCAGCCTTGAGCCCAGGGGACTCAGTGCCCTCCATGCCCTGGCTGGCAGAAACCCT 


C AACAGC AGTCTGGG CACTGTGGGGCTCTC CCCG CCTCTCCTG CCTTGTTTGC CCCTC AGCGT 


GCCAGGCAGACTGGGGGCAGGACAGCCGGAAGCTGAGACCAAGGCTCCTCACAGAAGGGCCCA 


GGAAGTCCCCGCCCTTGGGACAGCCTCCTCCGTAGCCCCTGCACGGCACCAGTTCCCCGAGGG 


ACGCAGCAGGCCGCCTCCCGCAGCGGCCGTGGGTCTGCACAGCCCAGCCCAGCCCAAGGCCCC 


CAGGAGCTGGGACTCTGCTACACCCAGTGAAATGCTGTGTCCCTTCTCCCCCGTGCCCCTTGA 


TGCCCCCTCCCCACAGTGCTCAGGAGACCCGTGGGGCACGGAACAGGAGGGTCTGGACCCTGT 


GGCCCAGCCAAAGGCTACCAGACAGCCACAACCAGCCCAGCCACCATCCAGTGCCTGGGGCCT 


GGCCACTGGCTCTTCACAGTGGACCCCAGCACCTCGGGGTGGCAGAGGGACGGCCCCCACGGC 


CCAGCAGACATGCGAGCTTCCAGAGTGCAATCTATGTGATGTCTTCCAACGTTAATAAATCAC 


ACAGCCTCCCAGGAGGGAGACGCTGGGGTGCAAAAAAAAAGCAAAA 




ORF Start: ATG at 271 ] ~]ORF Stop: TAG at 913 




SEQ IDNO: 314 J214 aa ]MW at 24265.6kD 


NOV87d. 

CG59920-01 Protein 
Sequence 


MDDSETGFNLKWLVSFKQCLDEKEEVLLDPYIASWKGLVRFLNSLGTIFSFISKDWSKLRI 
MERLRGGPQSEHYRSLQAMVAHELSNRLVDLEGRSHHPESGCRTVLRLHRALHWLQLFLEGLR 
TSPEDARTSALCADSYNASIJVAYHPWWRRAVTVAFCTLPTREVFLEAMNVGPPEQAVQMLGE 
ALPFIQRVYNVSQKLYAEHSLLDLP 




SEO IDNO* 315 =1279bD i 


NOV87e, 
308559628 DNA 
Sequence 

1 


GCCCTCGAGGTAGGGGTGATTCAGGGTGTGCTCCATGATGGTCAGAAGCGCCAGCCACGTCTC 


CTTGGCTGTGGGGATGATCTGGGAGGCTGGCAGCAGGAAGCCATAGCGCCCAGTGTCCCGGAG 


C T CG AAGGTG AAGG AGTACTTG ATG C CC TGG CTGT AGGTCC AGTC AAT AGTGC TT CC ACTGGC 


TTGATAAATTGCCTTGATGATGCTGCCATAGTTGAACTTGGTCCCGTAGAGAGAGGCCAGGGC 


TGTCACAGCAGCCTTGGAAAGCTGATCCAGCTCATCCTGGTCAGGGACTGGTTCTGTTTTGTA 


GCCATAGGGATACATGAGGAGCTGGGAGTAGCTGTGGATGGAGATGAAGGCCTTGATGTTCCC 


ATGGTCCTTCACAAAGTCTACAATGGACTTGACCTCCACTTCGGAATTGGCAAACTTGCCGTG 


GTAAGTCTCCG AG CAGGGGTTACTGCTGGCTCCGGACAACCCAAAGCC AGCGT CCCAGTTCCT 
GTTGGGGTCCACGCCAATACAGAGGGAGCCTGCTGTGTGGGACCGAGTCTTGCGCCACATGCG 
ATTCGTGCTGTGCGTGAAGGCAAAGCCATCAGGGTTGGTGACGATCTCCAGGAAGATGTCCAA 
GGTGTCGAGAATGGCGGTGAAAGCTGCATCCTGCCCGTAGTCTTGAGTGATCTTCTTTGCAAA 
CCAGACCCCACTGGCCTGGGTGACCCACTCCCGGGAATGGATGCCCGTGTCGATCCAGATGGC 


TGGACGCTTACTGCCCCCCGTGCTGAACTTCAGCACGTAAATGGGACGCCCTTCATAGGTGTT 


GCCAATCTGGATCTTGCTGACAAGGTGCGGGTTCTCCGCCACCAGCAGGTCCAGGAAGTCATA 


GATCTCCTCCAGGGTGTGGTAGGTGGCGTAGTTAAAAGTGTCGGTGGAGCGCGCCCGGGACCG 


GAAGGCGAACATCTGCTCCTGCTCCTCGTCCAGCAGCGACTGCACGTCCTCGATCATGGTCTC 


ATAGCTGATGCCGTGGGACTCCAGAAAGATCTTGACCGCCTGGATGCTGGGGAAGGGCACTCG 


GACGTCGATGGGGGAGCCAGGGTGGGCAGGCCCCCGCCAGAAGTCCAGCTGCAGGTGCTCCAG 




GTCCTCCAGCTCCTTCACCTTCTGTACCTGGGCCTCATCGGCTACAGAGATTCGGAGCACCTG 




ATGCCCCACAAAGTCCTCCTTGCCAAAGACAGCCCCCAACAGGACACTCAACACCAGCAACCC 


CCGCATGGTGGATCCGGTG 




ORF Start: at 446 j ORF Stop: TAG at 668 
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;NOV87e ? 
;308559628 Protein 
[Sequence 



SEQIDNO:316 



174 



aa 



!MWat8l00.6kD 



VS EQGLLLAPDNPKPASQ FLLGSTP I QRE PAVWDR VLRHMR FVLCVKAKPSGLVT I SRKMS K V 
SRMAVKAASCP 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 87B. 



| Table 87B. Comparison of NOV87a against NOV87b through NOV87e. 


i 

Protein Sequence 


NOV87a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV87b 


1..214 
1..214 


213/214 (99%) 
213/214(99%) 


NOV87c 

i 

f 


1..214 
5..218 


213/214(99%) 
213/214(99%) 


NOV87d 


1..214 
1..214 


214/214(100%) 
214/214(100%) 


NOV87e 


No Significant Alignment Found. 



5 

Further analysis of the NOV87a protein yielded the following properties shown in 
Table 87C. 



\ ... — 

j Table 87C. Protein Sequence Properties NOV87a 


| PSort 
i analysis: 


0.4500 probability located in cytoplasm; 0.3577 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 



10 A search of the NOV87a protein against the Geneseq database, a proprietary database 

that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 87D. 



Table 87D. Geneseq Results for NOV87a 



Geneseq 
Identifier 


Protein/Organism/Length [Patent #, 
Date) 


NOV87a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB418I2 








e-121 



417 
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sequence SEQ ID N0:3 1 52 - Homo 
sapiens. 214 aa. [WO200058473-A2, 
05-OCT-2000] 


1..2I4 


214/214(100%) 




ABB65I03 


Drosophila melanogaster polypeptide 
SEQ 1 D NO 22 1 0 1 - Drosophila 
melanogaster, 482 aa. [ WO200 1 7 1 042- 
A2,27-SEP-2001] 


8..2I4 
279..482 


63/209 (30%) 
120/209 (57%) 


6e-24 


AAU79185 


Human phosphatidyl inositol-four- 
phosphate adaptor protein-2 (FAPP-2) - 
Homo sapiens, 507 aa. 
[WO200212276-A2, 14-FEB-2002] 


23.. 177 
326-472 


40/156 (25%) 
77/156 (48%) 


6e-07 


ABG20808 


Novel human diagnostic protein 
#20799 - Homo sapiens, 391 aa. 
[ WO200 1 75067-A2, 1 1 -OCT-200 1 ] 


23.. 177 
210..356 


40/156 (25%) 
77/156(48%) 


6e-07 


AAB95725 


Human protein sequence SEQ ID 
NO: 18600 - Homo sapiens, 193 aa. 
[EPI0746I7-A2, 07-FEB-2001] 


23.. 177 
12.. 158 


39/156 (25%) 
77/156 (49%) 


8e-07 



In a BLAST search of public sequence datbases, the NOV87a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 87E. 



Table 87E. Public BLASTP Results for NOV87a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV87a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


Q99LU9 


Hypothetical 24.6 kDa protein - 
Mus musculus (Mouse), 216 aa. 


1..2I4 
1..216 


169/216(78%) 
193/216(89%) 


9e-95 


AAH30735 


Similar to hypothetical protein, 
MGC:7473 - Mus musculus 
(Mouse), 321 aa. 


17..213 
119..320 


74/204 (36%) 
108/204 (52%) 


le-27 


AAH25515 


Hypothetical protein - Mus 
musculus (Mouse), 207 aa. 


17..213 
5..206 


74/204 (36%) 
108/204 (52%) 


le-27 


AAM70862 


CG30392-PA - Drosophila 
melanogaster (Fruit fly), 223 aa. 


8..2I4 
20..223 


63/209 (30%) 
120/209 (57%) 


2e-23 


Q9W2I4 


CG 10509 protein - Drosophila 
melanogaster (Fruit fly), 482 aa. 


8..214 
279..482 


63/209 (30%) 
120/209 (57%) 


2e-23 



5 

PFam analysis predicts that the NOV87a protein contains the domains shown in the 
Table 87F. 



Table 87 F. Domain Analysis of NOV87a 



418 



WO 03/023002 




'CT/US02/28539 



Pfam Domain 


NOV87a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 





Example 88. 

The NOV88 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 88A. 

5 



Table 88A. NOV88 Sequence Analysis 




SEQIDNO:317 


1911 bp | 


NOV88a, 
CG59983-0I DNA 
Sequence 


CGGCGAGTGAATCTACCCAGGAGCTGAAGCTTCCCTTGGCAACTTGGGGACACCAGGGCCACA 


GTGACTTTGTGGCTTGGCTTTAGATAAATTTGACCATGGGCTGTAGAACCCAGCAGCTCAGAA 


T C CAT T AAAAGG AG AG CTGGG AGGAG AATGAAG AAG ATGAGC AAC ATTT ATG AGT CCG CTGCC 
AACACACTGGGAATCTTTAACAGCCCCTGCCTGACCAAAGTTGAGCTGCGTGTGGCGTGCAAA 
GG C ATTTCTG ACAG AG ATGCCCTTTC C AAAC C AGGCCCCTGTGTC ATCCTC AAG ATGC AGTCT 
CATGGGCAGTGGTTTGAGGTTGACAGGACTGAGGTGATTCGCACCTGCATAAACCCAGTGTAC 
TCAAAACTGTTTACTGTGGACTTTTACTTTGAGGAGGTGCAGCGCCTGCGGTTTGAAGTCCAT 
GACATCAGCAGCAACCACAATGGGCTGAAGGAGGCCGACTTCCTTGGTGGCATGGAGTGCACA 
CTTGGCCAGATTGTTTCCCAGAGAAAGCTGTCCAAATCCTTGCTGAAGCATGGGAACACAGCA 
GGGAAATCTTCCATCGCGGTGATTGCTGAAGAATTATCTGGCAATGACGACTATGTTGAGCTT 
GCATTCAATGCACGGAAATTGGATGACAAGGATTTCTTCAGTAAATCTGACCCATTTCTGGAA 
ATTTTTCGTATGAATGATG ATGC AACTCAGCAGCTGGTGCACCGAACTGAGGTTGTG ATG AAT 
AACTTAAGCCCAGCCTGGAAATCATTCAAAGTATCTGTAAATTCTCTATGCAGCGGAGACCCA 
GACCGCCGGCTAAAGTGCATAGTATGGGACTGGGACTCCAATGGCAAGCATGACTTCATTGGA 
GAATTCACCTCGACATTCAAGGAGATGAGAGGAGCAATGGAAGGGAAACAGGTGCAGTGGGAG 
TGCATCAATCCCAAGTACAAAGCCAAGAAGAAGAATTACAAGAACTCAGGCACTGTGATTCTG 
AATCTGTGCAAGATTCACAAGATGCATTCTTTCTTGGACTACATCATGGGTGGCTGCCAAATC 
CAGTTTACAGTAGCTATAGATTTCACTGCCTCAAACGGGGACCCCAGGAACAGCTGTTCCTTG 
CACTACATCCACCCTTACCAACCCAATGAGTATCTGAAAGCTTTGGTAGCTGTGGGGGAGATT 
TGCCAAGACTATGACAGTGACAAAATGTTCCCTGCCTTTGGGTTTGGCGCCAGGATACCTCCA 
GAGTACACGGTCTCTCATGACTTTGCAATCAACTTTAATGAAGAACAACCCAAATGTGCAGGA 
ATTCAAGGAGTTGTGGAAGCCTATCAGAGCTGTCTTCCTAAGCTCCAACTCTACGGTCCCACC 
AACATTGCCCCCATCATCCAGAAGGTTGCCAAGTCAGCGTCAGAGGAAACTAACACCAAGGAG 
GCATCGCAATACTTCATCCTGCTGATCCTGACAGATGGTGTTATCACAGACATGGCCGACACC 
CGGGAGGCCATTGTCCATGCCTCCCACCTCCCCATGTCAGTCATCATCGTGGGAGTAGGGAAC 
GCTGACTTCAGTGACATGCAGATGCTGGACGGTGATGATGGGATTCTGAGGTCACCCAAGGGA 
GAGCCTGTTCTTCGAGACATCGTCCAGTTCGTGCCCTTCAGGAACTTCAAACACGCATCTCCA 
GC TG C CCTGGCAAAG AGCGTG CTGG CTG AAGTCCCAAAC C AAGTTGTGG ACT AT T ACAATGG C 
AAAGGAATTAAACCAAAATGTTCATCAGAAATGTATGAATCTTCCAGCACACTAGCACCATGA 
ACTCCCCACACAGTTTTACAGAGTTCTGAAATACTATTCCTGCTAATATTTCATATTTAATAC 


TTCTACTATTCCTGCAAATGG 




ORF Start: ATG at 154 


]ORF Stop: TGA at 1825 


jSEQlDNO:318 557 aa MWat 62264 .4kD 


NOV88a, 

CG59983-01 Protein 
Sequence 


MKKMSNI YESAANTLGIFNSPCLTKVELRVACKGISDRDAIiSKPGPCVILKMQSHGQWFEVDR 
TEVIRTCINPVYSKLFTVDFYFEEVQRLRFEVHDISSNHNGLKEADFLGGMECTLGQIVSQRK 
LSKSLLKHGNTAGKSSIAVIAEELSGNDDYVELiAFNARKLDDKDFFSKSDPFLEIFRMNDDAT 
QQLVHRTEWMNNLSPAWKSFKVSVNSLCSGDPDRRLKCIVWDWDSNGKHDFIGEFTSTFKEM 
RGAMEGKQVQWECINPKYKAKKKNYKNSGTVILNLCKIHKMHSFLDYIMGGCQIQFTVAIDFT 
ASNGDPRNSCSLHYIHPYQPNEYLKALVAVGEICQDYDSDKMFPAFGFGARIPPEYTVSHDFA 
INFNEEQPKCAGIQGVVEAYQSCLPKLQLYGPTNIAPIIQKVAKSASEETNTKEASQYFILLI 
LTDGVITDMADTREAIVHASHLPMSVIIVGVGNADFSDMQMLDGDDGILRSPKGEPVLRDIVQ 
F V P FRNF KHAS P AAL AKS VLAE VPNQ WD Y YNG KGIKPKCSSEMYESS ST LAP 




SEQIDNO:319 


1795 bp J 


NOV88b, 


AGTGACTTTGTGGCTTGGCTTTAGATAAATTTGACCATGGCTGTAGAACCCAGCAGCTCAGAA 
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CG59983-02 DNA 
.Sequence 

i 

i 

i 

i 
i 

1 

1 
i 
| 

i 

| 

j 
j 


TCCATTAAAAGGAGAGCTGGGAGGAGAATGAAGAAGATGAGCAACATTTATGAGTCCGCTGCC 
AAC AC ACTGGG AATCTTT AAC AG CCC CTG CCTG ACC AAAGTTG AG CTGCGTGTGGCGTG C AAA 
GGCATTTCTGACAGAGATGCCCTTTCCAAACCAGACCCCTGTGTCATCCACAAGATGCAGTCT 
C ATGGGC AGTGGTTTG AGGTTG AC AG G ACTG AGGTG ATTCG C ACC TG C AT AAAC CC AGTGT AC 
TCAAAACTGTTTACTGTGGACTTTTACTTTGAGGAGGTGCAGCGCCTGCGGTTTGAAGTCCAT 
GACATCAGCAGCAACCACAATGGGCTGAAGGAGGCCGACTTCCTTGGTGGCATGGAGTGCACA 
CTTGGCCAGATTGTTTCCCAGAGAAAGCTGTCCAAATCCTTGCTGAAGCATGGGAACACAGCA 
GGGAAATCTTCCATCACGGTGATTGCTGAAGAATTATCTGGCAATGACGACTATGTTGAGCTT 
GCATTCAATGCACGGAAATTGGATGACAAGGATTTCTTCAGTAAATCTGACCCATTTCTGGAA 
ATTTTTCGTATGAATGATGATGCAACTCAGCAGCTGGTGCACCGAACTGAGGTTGTGATGAAT 
AACTTAAGCCCAGCCTGGAAATCATTCAAAGTATCTGTAAATTCTCTATGCAGCGGAGACCCA 
GACCGCCGGCTAAAGTGCATAGTATGGGACTGGGACTCCAATGGCAAGCATGACTTCATTGGA 
GAATTCACCTCGACATTCAAGGAGATGAGAGGAGCAATGGAAGGGAAACAGGTGCAGTGGGAG 
TGCATCAATCCCAAGTACAAAGCCAAGAAGAAGAATTACAAGAACTCAGGCACTGTGATTCTG 
AATCTGTGCAAGATTCACAAGATGCATTCTTTCTTGGACTACATCATGGGTGGCTGCCAAATC 
CAGTTTACAGTAGCTATAGATTTCACTGCCTCAAACGGGGACCCCAGGAACAGCTGTTCCTTG 
CACTACATCCACCCTTACCAACCCAATGAGTATCTGAAAGCTTTGGTAGCTGTGGGGGAGATT 
TGCCAAGACT ATG AC AGTG AC AAAATGTT CCCTGCCTTTGGGTTTGG CGC CAGGAT ACCTC C A 
GAGTACACGGTCTCTCATGACTTTGCAATCAACTTTAATGAAGACAACCCAGAATGTGCAGGA 
ATTCAAGGAGTTGTGGAAGCCTATCAGAGCTGTCTTCCTAAGCTCCAACTCTACGGTCCCACC 
AACATTGCCCCCATCATCCAGAAGGTTGCCAAGTCGGCGTCAGAGGAAACTAACACCAAGGAG 
GCATCGCAATACTTCATCCTGCTGATCCTGACAGATGGTGTTATCACAGACATGGCCGACACC 
CGGGAGGCCATTGTCCATGCCTCCCACCTCCCCATGTCAGTCATCATCGTGGGAGTAGGGAAC 
GCTGACTTCAGTGACATGCAGATGCTGGACGGTGATGATGGGATTCTGAGGTCACCCAAGGGA 
GAGCCTGTTCTTCGAGACATCGTCCAGTTCGTGCCCTTCAGGAACTTCAAACATGCATCTCCA 
GCTGCCCTGGCAAAGAGCGTGCTGGCTGAAGTCCCAAACCAAGTTGTGGACTATTACAATGGC 
AAAGGAATTAAACCAAAATGTTCATCAGAAATGTATGAATCTTCCAGAACACTAGCACiATGA 
ACTCCCCACACAGTTTTACAGAGTTCTGAAA 




ORF Start: ATG at 91 


|ORF Stop:TGA at 1762 


i 


SEQIDNO:320 557 aa jMW at 62418.4kD 


jNOV88b, 

JCG59983-02 Protein 
jSequence 

1 

i 

i , . . 


MKKMSNIYESAANTLGIFNSPCLTKVELRVACKGISDRDALSKPDPCVIHKMQSHGQWFEVDR 
TEVIRTCINPVYSKLFTVDFYFEEVQRLRFEVHDISSNHNGLKEADFLGGMECTLGQIVSQRK 
LSKSLLKHGNTAGKSSITVIAEELSGNDDYVELAFNARKLDDKDFFSKSDPFLEIFRMNDDAT 
QQLVHRTEVVMNNLSPAWKSFKVSVNSLCSGDPDRRLKCIVWDWDSNGKHDFIGEFTSTFKEM 
RGAMEGKQVQWECINPKYKAKKKNYKNSGTVILNLCKIHKMHSFLDYIMGGCQIQFTVAIDFT 
ASNGDPRNSCSLHYIHPYQPNEYLKALVAVGEICQDYDSDKMFPAFGFGARIPPEYTVSHDFA 
INFNEDNPECAGIQGWEAYQSCLPKLQLYGPTNIAPIIQKVAKSASEETNTKEASQYFILLI 
LTDGVITDMADTREAIVHASHLPMSVIIVGVGNADFSDMQMLDGDDGILRSPKGEPVLRDIVQ 
FVP FRN FKHAS PAALAKS VLAEV PNQ WD Y YNGKG I K P KCS S EM YE S S RTLAP 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 88B. 



j Table 88B. Comparison of NOV88a against NOV88b. 


i 

' Protein Sequence 


NOV88a Residues/ 


Identities/ 


Match Residues 


Similarities for the Matched Region 


NOV88b 


1..557 


529/557 (94%) 




I. .557 


531/557 (94%) 



5 

Further analysis of the NOV88a protein yielded the following properties shown in 
Table 88C. 
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! Table 88C. Protein Sequence Properties NOV88a 



PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3000 probability located in microbody 
(peroxisome); 0. 1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 

— ... . . 



A search of the NOV88a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 88D. 

5 



Table 88D. Geneseq Results for NOV88a 


Geneseq 
Identifier 


Protcin/Organism/Length [Patent #, 
Date] 


NOV88a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


ABB06047 


Human NS protein sequence SEQ ID 
NO: 139 - Homo sapiens, 564 aa. 
[WO2002063 15-A2, 24-JAN-2002] 


11. .543 
9..543 


375/535 (70%) 
453/535 (84%) 


0.0 


ABB 10990 


Human copine VII homologue, SEQ ID 
NO: 1 360 - Homo sapiens, 4 1 5 aa. 
[ WO200 1571 88- A2, 09-AUG-200 1 ] 


60..444 
8..394 


354/387(91%) 
362/387 (93%) 


0.0 


AAY49835 


Mouse neuronal activity regulated C2- 
domain containing protein - Mus sp, 
557 aa. [JPI 1269198-A, 05-OCT-1999] 


3..542 
5..541 


340/543 (62%) 
429/543 (78%) 


0.0 


AAY49836 


Human neuronal activity regulated C2- 
domain containing protein - Homo 
sapiens, 557 aa. [JPI 1 269198-A, 05- 
OCT-1999] 


24..542 
20..541 


334/522 (63%) 
424/522 (80%) 


0.0 


AAY49834 


Mammalian brain specific generic 
protein - Mammalia, 557 aa. 
[JPI 1269198-A, 05-OCT-1999] 


24..S42 
20..541 


333/522 (63%) 
419/522 (79%) 


0.0 



In a BLAST search of public sequence datbases, the NOV88a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 88E. 



Table 88E. Public BLASTP Results for NOV88a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV88a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 
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Q8TEXI 


Copine-like protein isoform B - Homo 
sapiens (Human), 575 aa. 


I..557 
I9..575 


551/557 (98%) 
553/557 (98%) 


0.0 


Q96A23 


CDNA FU3I613 fis, clone 
NT2RI2002958, moderately similar to 
Homo sapiens copine VI protein (Similar 
to RIKEN cDNA 363241 1M23 gene) 
(Copine-like protein isoform A) - Homo 
sapiens (Human), 557 aa. 


1..557 
1..557 


551/557(98%) 
553/557 (98%) 


0.0 


Q9Z140 


Copine VI (Neuronal-copine) (N-copine) - 
Mus musculus (Mouse), 557 aa. 


3..542 
5..541 


340/543 (62%) 
429/543 (78%) 


0.0 


095741 


Copine VI (Neuronal-copine) (N-copine) - 
Homo sapiens (Human), 557 aa. 


24..542 
20..541 


334/522 (63%) 
424/522 (80%) 


0.0 


Q8WVG1 


Copine VI (neuronal) - Homo sapiens 
(Human), 557 aa. 


24..542 
20..541 


333/522 (63%) 
424/522 (80%) 


0.0 



PFam analysis predicts that the NOV88a protein contains the domains shown in the 
Table 88F. 



Tabic 88F. Domain Analysis of NOV88a 


Pfam Domain 


NOV88a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


C2 


28.1 15 


20/100(20%) 
59/100(59%) 


0.085 


C2 


161. .246 


26/102 (25%) 
62/102(61%) 


0.00057 



5 

Example 89. 

The NOV89 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 89A. 



jTable 89A. NOV89 Sequence Analysis 




SEQIDNO:321 J724 bp 




NOV89a, 
CG93335-0I DNA 
Sequence 


CAGGAGGCGGGTGGGTCAAGGTAACTCTGGGCTACAGAGTCCTTGCTGGGGGTTCGGGGAGCG 


CTTGGACCCCGGCTTCTGGGACGCGTCAGAATATTATCCAGCAATGCAAATGAACAAACTATA 


ACTACACACAGCTGCATGGATAAATGTCAGAAACATGACGTTGAGTGTGAGAAGCCAGATGCA 
AACGAGGACTCACTGTGCAATTCTGTGCATGTACAGTGGCCAGGAGAAGGGAGCACTGGCTTT 
GCTTTCATCAGGCCAAAGATGCCTTTCTTTGGGAATACGTTCAGTCCGAAGAAGACACCTCCT 
CGGAAGTCGGCATCTCTCTCCAACCTGCATTCTTTGGATCGATCAACCCGGGAGGTGGAGCTG 
GGCTTGGAATACGGATCCCCGACTATGAACCTGGCAGGGCAAAGCCTGAAGTTTGAAAATGGC 
CAGTGGATAGCAGAGACAGGGGTTAGTGGCGGTGTGGACCGGAGGGAGGTTCAGCGCCTTCGC 
AGGCGGAACCAGCAGTTGGAGGAAGAGAACAATCTCTTGCGGCTGAAAGTGGACATCTTATTA 
GACATGCTTTCAGAGTCCACTGCTGAATCCCACTTAATGGAGAAGGAACTGGATGAACTGAGG 
ATCAGCCGGAAGAGAAAATGAAGACCCCAGAGACATTTATTGGGGAGTAGGATGTGGCTGAGT 
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G CTTTTTTTTTGG CC AG ACTAG CGG ATTC AG 




ORF Start: ATG at 1 42 J jORF Stop: TGA at 649 




SEQIDNO:322 |l69aa |MWat !9286.6kD 


NOV89a, 

CG93335-0! Protein 
Sequence 


MDKCQKHDVECEKPDANEDSLCNSVHVQWPGEGSTGFAFIRPKMPFFGNTFSPKKTPPRKSAS 
LSNLHSLDRSTREVELGLEYGSPTMNLAGQSLKFENGQWIAETGVSGGVDRREVQRLRRRNQQ 
LEEENNLLRLKVDILLDMLSESTAESHLMEKELDELRISRKRK 




SEQIDNO:323 


615 bp 




NOV89b, 
CG93335-02 DNA 
Sequence 


CGTCAGAATATTATCCAGCAATGCAAATGAACAAACTATAACTACACACAGCTGCATGGATAA 


ATGTCAGAAACATGACGTTGAGTGTGAGAAGCCAGATGCAAACGAGGACTCACTGTGCAATTC 
TGTGCATGTACAGTGGCCAGGAGAAGGGAGCACTGGCTTTGCTTTCATCAGGCCAAAGATGCC 
TTTCTTTGGGAATACGTTCAGTCCGAAGAAGACACCTCCTCGGAAGTCGGCATCTCTCTCCAA 
CCTGCATTCTTTGGATCGATCAACCCGGGAGGTGGAGCTGGGCTTGGAATACGGATCCCCGAC 
TATGAACCTGGCAGGGCAAAGCCTGAAGTTTGAAAATGGCCAGTGGATAGCAGAGACAGGGGT 
TAGTGGCGGTGTGGACCGGAGGGAGGTTCAGCGCCTTCGCAGGCGGAACCAGCAGTTGGAGGA 
AGAGAACAATCTCTTGCGGCTGAAAGTGGACATCTTATTAGACATGCTTTCAGAGTCCACTGC 
TG AATCC C ACTT AATGG AG AAGG AACTGG ATG AACTG AGGAT C AG CCGGAAG AGAAAATG AAG 
ACCCCAGAGACATTTATTGGGGAGTAGGATGTGGCTGAGTGCTTTTTT 




ORF Start: ATG at 56 




ORF Stop: TGA at 563 




SEQIDNO:324 j 1 69 aa 


MWat I9286.6RD 


NOV89b, 

CG93335-02 Protein 
Sequence 


MDKCQKHDVECEKPDANEDSLCNSVHVQWPGEGSTGFAFIRPKMPFFGNTFSPKKTPPRKSAS 
LSNLHSLrDRSTREVELGLEYGSPTMNLAGQSLKFENGQWIAETGVSGGVDRREVQRLRRRNQQ 
LEEENNLLRLKVDILLDMLSESTAESHLMEKELDELRISRKRK 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 89B. 



Table 89B. Comparison of NOV89a against NOV89b. 


„ 0 NOV89a Residues/ 
Protein Sequence , . « • ^ 

^ Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV89b 


1..169 
1..169 


146/169 (86%) 
146/169 (86%) 



5 



Further analysis of the NOV89a protein yielded the following properties shown in 
Table 89C. 



Table 89C. Protein Sequence Properties NOV89a 


PSort 
analysis: 


0.4600 probability located in nucleus; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 


SignalP 
analysis: 


No Known Signal Sequence Predicted 
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A search of the NOV89a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 89D. 



Table 89D. Geneseq Results for NOV89a 



! Geneseq 
j Identifier 


Protein/Organism/Length [Patent #, 
Date] 


NOV89a 
Residues/ 
Match 
Kcsiuues 


Identities/ 
Similarities for 
the Matched 
Keg ion 


Expect 
Value 


j AAM00955 

| 


Human bone marrow protein, SEQ ID 
NO: 431 - Homo sapiens, 175 aa. 
[WO200 1 53453-A2, 26-JUL-200 1 ] 


31. .169 
37.. 175 


139/139(100%) 
139/139(100%) 


le-74 


|AAY8620I 

1 

1 

i 
• 


Nuclear transport protein clone 
hfb2025 protein sequence - Homo 
sapiens, 67 aa. [W09964455-A \ , 1 6- 
DEC-1999] 


103.169 
I..67 


67/67(100%) 
67/67(100%) 


6e-30 


ABB23535 


Protein #5534 encoded by probe for 
measuring heart cell gene expression - 
Homo sapiens, 26 aa. [WO2001 57274- 
A2,09-AUG-2001] 


44..69 
I..26 


26/26(100%) 
26/26(100%) 


2e-08 


AAB69070 

I 


Human male enhanced antigen-2 
(MEA-2) protein sequence SEQ ID 
NO:2 - Homo sapiens, 1374 aa. 
[JP20003 1 6580-A, 2 1 -NO V-2000] 


62.. 163 
768..868 


25/102 (24%) 
45/102(43%) 


1.2 


ABB06335 


Human GDMLP-1 orthologue 
BAA93660 protein sequence - Homo 
sapiens, 1694 aa. [WO200I92524-A2. 
06-DEC-2001] 


I16..151 
917..952 


16/36 (44%) 
25/36 (69%) 


1.6 



5 



In a BLAST search of public sequence datbases, the NOV89a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 89E. 

i ™~ 1 ~ 



j Tabic 89E. Public BLASTP Results for NOV89a 



j 

j Protein 
j Accession 
1 Number 


Protein/Organism/Length 


NOV89a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


^Q9Y3M2 


Protein C22orf2 (Cytosolic leucine- 
rich protein) (HRIHFB2025) - Homo 
sapiens (Human), 126 aa. 


44.. 169 
I..I26 


126/126(100%) 
126/126(100%) 


4e-66 


AAM73678 


Leucine-rich protein - Bos taurus 
(Bovine), 127 aa. 


44.. 169 
I..127 


115/127 (90%) 
124/127 (97%) 


1e-60 
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[AAM73679 


Leucine-rich protein - Rattus 
norvegicus (Rat), 1 27 aa. 


44.. 169 
I..126 


105/126 (83%) 
120/126(94%) 


le-56 


Q9DIC2 


Protein C22ort2 homolog (Cytosolic 
leucine-rich protein) - Mus musculus 
(Mouse), 127 aa. 


44.. 169 
1 .126 


104/126 (82%) 
120/126 (94%) 


le-56 


AAM73681 


Leucine-rich protein - Brachydanio 
rerio (Zebrafish) (Danio rerio), 125 
aa. 


44.. 169 
1 ..124 


93/126 (73%) 
114/126 (89%) 


2e-49 



PFam analysis predicts that the NOV89a protein contains the domains shown in the 
Table 89F. 



Table 89F. Domain Analysis of NOV89a 



Pfam Domain 



NOV89a Match Region 



Identities/ 
Similarities 

for the Matched Region 



Expect Value 



Example 90. 

The NOV90 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 90A. 



jTable 90A. NOV90 Sequence Analysis 



SEQ ID NO: 325 _ llP^JJ^P _ _ 

[ ggatcgaatcgcggccgcgtcgacggttt^aggcgctttcctcttggaagtggcgactgctg 



NOV90a, 
CG94377-0I DNA 
Sequence 



CGGGCCTGAGCGCTGGTCTCACGCGCCTCGGGAGCCAGGTTGGCGGCGCGATGAGGCGCAGCA 



AGGCTGACGTGGAGCGGTACATCGCCTCGGTGCAGGGCTCCACCCCGTCGCCTCGACAGAAGT 
CAATGAAAGGATTCTATTTTGCAAAGCTGTATTATGAAGCTAAAGAATATGATCTTGCTAAAA 
AATACATATGTACTTACATTAATGTGCAAGAGAGGGATCCCAAAGCTCACAGATTTCTGGGTC 
TTCTTTATGAATTGGAAGAAAACACAGACAAAGCCGTTGAATGTTACAGGCGTTCAGTGGAAT 
TAAACCCAACACAAAAAGATCTTGTGTTGAAGATTGCAGAATTGCTTTGTAAAAATGATGTTA 
CTGATGGAAGAGCAAAATACTGGCTTGAAAGAGCAGCCAAACTTTTCCCAGGAAGTCCTGCAA 
TTTATAAACTAAAGGAACAGCTTCTAGATTGTGAAGGTGAAGATGGATGGAATAAACTTTTTG 
ACTTGATTCAGTCAGAACTTTATGTAAGACCTGATGACGTCCATGTGAACATCCGGCTAGTGG 
AGGTGTATCGCTCAACTAAAAGATTGAAGGATGCTGTGGCCCACTGCCATGAGGCAGAGAGGA 
ACATAGCTTTGCGTTCAAGTTTAGAATGGAATTCGTGTGTTGTACAGACCCTTAAGGAATATC 
TGGAGTCTTTACAGTGTTTGGAGTCTGATAAAAGTGACTGGCGAGCAACCAATACAGACTTAC 
TGCTGGCCTATGCTAATCTTATGCTTCTTACGCTTTCCACTAGAGATGTGCAGGAAAGTAGAG 
AATT ACTG C AAAGTT TTG AT AGT G CT CTT C AGTCTGTG AAAT CTTTGGG TGG AAATGATGAAC 
TG TC AGCT ACT TT CTTAG AAATG AAAGG AC ATTT CT AC ATGC ATGCTGGTTCTCTGCTTTTG A 
AGATGGGTCAGCATAGTAGTAATGTTCAATGGCGAGCTCTTTCTGAGCTGGCTGCATTGTGCT 
ATCTCATAGCATTTCAGGTTCCAAGACCAAAGATTAAATTAATAAAAGGTGAAGCTGGACAAA 
ATCTGCTGGAAATGATGGCCTGTGACCGACTGAGCCAATCAGGGCACATGTTGCTAAACTTAA 
GTCGTGGCAAGCAAGATTTTTTAAAAGAGATTGTTGAAACTTTTGCCAACAAAAGCGGGCAGT 
CTGCATTATATGATGCTCTGTTTTCTAGTCAGTCACCTAAGGATACATCTTTTCTTGGTAGCG 
ATGATATTGGAAACATTGATGTACGAGAACCAGAGCTTGAAGATTTGACTAGATACGATGTTG 
GTGCTATTCGAGCACATAATGGTAGTCTTCAGCACCTTACTTGGCTTGGCTTACAGTGGAATT 
CATTGCCTGCTTTACCTGGAATCCGAAAATGGCTAAAACAGCTTTTCCATCATTTGCCCCATG 
AAACCTCAAGGCTTGAAACAAATGCACCTGAATCAATATGTATTTTAGATCTTGAAGTATTTC 
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TCCTTGGAGTAGTATATACCAGCCACTTACAATTAAAGGAGAAATGTAATTCTCACCACAGCT 
CCTATCAGCCGTTATGCCTGCCCCTTCCTGTGTGTAAACAGCTTTGTACAGAAAGACAAAAAT 
CTTGGTGGGATGCGGTTTGTACTCTGATTCACAGAAAAGCAGTACCTGGAAACGTAGCAAAAT 
TGAGACTTCTAGTTCAGCATGAAATAAACACTCTAAGAGCCCAGGAAAAACATGGCCTTCAAC 
CTGCTCTGCTTGTACATTGGGCAGAATGCCTTCAGAAAACGGGCAGCGGTCTTAATTCTTTTT 
ATGATCAACGAGAATACATAGGGAGAAGTGTTCATTATTGGAAGAAAGTTTTGCCATTGTTGA 
AGATAATAAAAAAGAAGAACAGTATTCCTGAACCTATTGATCCTCTGTTTAAACATTTTCATA 
GTGTAGACATTCAGGCATCAGAAATTGTTGAATATGAAGAAGACGCACACATAACTTTTGCTA 
TATTGGATGCAGTAAATGGAAATATAGAAGATGCTGTGACTGCTTTTGAATCTATAAAAAGTG 
TTGTTTCTTATTGGAATCTTGCACTGATTTTTCACAGGAAGGCAGAAGACATTGAAAATGATG 
CCCTTTCTCCTGAAGAACAAGAAGAATGCAAAAATTATCTGAGAAAGACCAGGGACTACCTAA 
TAAAGATTATAGATGACAGTGATTCAAATCTTTCAGTGGTCAAGAAATTGCCTGTGCCCCTGG 
AGTCTGTAAAAGAGATGCTTAATTCAGTCATGCAGGAACTCGAAGACTATAGTGAAGGAGGTC 
CTCTCTATAAAAATGGTTCTTTGCGAAATGCAGATTCAGAAATAAAACGTTCTACACCGTCTC 
CTACCAGATATTCACTATCACCAAGTAAAAGTTACAAGTATTCTCCCAAAACACCACCTCGAT 
GGGCAGAAGATCAGAATTCTTTACTGAAAATGATTTGCCAACAAGTAGAGGCCATTAAGAAAG 
AAATGCAGGAGTTGAAACTAAATAGCAGTAACTCAGCATCCCCTCATCGTTGGCCCACAGAGA 
ATTATGGACCAGACTCGGTGCCTGATGGATATCAGGGGTCACAGACATTTCATGGGGCTCCAC 
TAACAGTTGCAACTACTGGCCCTTCAGTATATTATAGTCAGTCACCAGCATATAATTCCCAGT 
ATCTTCTCAGACCAGCAGCTAATGTTACTCCCACAAAGGGCCCAGTCTATGGCATGAATAGGC 
TTCCACCCCAACAGCATATTTATGCCTATCCGCAACAGATGCACACACCGCCAGTGCAAAGCT 
CATCTGCTTGTATGTTCTCTCAGGAGATGTATGGTCCTCCTGCATTGCGTTTTGAGTCTCCTG 
CAACGGGAATTCTATCGCCCAGGGGTGATGATTACTTTAATTACAATGTTCAACAGACAAGCA 
CAAATCCACCTTTGCCAGAACCAGGATATTTCACAAAACCTCCGATTGCAGCTCATGCTTCAA 
GATCTGCAGAATCTAAGACTATAGAATTTGGGAAAACTAATTTTGTTCAGCCCATGCCGGGTG 
AAGGATTAAGGCCATCTTTGCCAACACAAGCACACACAACACAGCCAACTCCTTTTAAATTTA 
ACTCAAATTTCAAATCAAATGATGGTGACTTCACGTTTTCCTCACCACAGGTTGTGACiACAGC 
CCCCTCCTGCAGCTTACAGTAACAGTGAAAGCCTTTTAGGTCTCCTGACTTCAGATAAACCCT 
TGCAAGGAGATGGCTATAGTGGAGCCAAACCAATTCCTGGTGGTCAAACCATTGGGCCTCGAA 
ATACATTCAATTTTGGAAGCAAAAATGTGTCTGGAATTTCATTTACAGAAAACATGGGGTCGA 
GTCAGCAAAAGAATTCTGGTTTTCGGCGAAGTGATGATATGTTTACTTTCCATGGTCCAGGGA 
AATCAGTATTTGGAACACCCACTTTAGAGACAGCAAACAAGAATCATGAGACAGATGGAGGAA 
GTGCCCATGGGGATGATGATGATGACGGTCCTCACTTTGAGCCTGTAGTACCTCTTCCTGATA 
AG ATT G AAGT AAAAACTGGTGAGG AAG ATG AAG AAG AAT TC T TTTGC AAC CG CGCG AAATTGT 
TTCGTTTCGATGTAGAATCCAAAGAATGGAAAGAACGTGGGATTGGCAATGTAAAAATACTGA 
GGCATAAAACATCTGGTAAAATTCGCCTTCTAATGAGACGAGAGCAAGTATTGAAAATCTGTG 
CAAATCATTACATCAGTCCAGATATGAAATTGACACCAAATGCTGGATCAGACAGATCTTTTG 
TATGGCATGCCCTTGATTATGCAGATGAGTTGCCAAAACCAGAACAACTTGCTATTAGGTTCA 
AAACTCCTGAGGAAGCAGCACTTTTTAAATGCAAGTTTGAAGAAGCCCAGAGCATTTTAAAAG 
CCCCAGGAACAAATGTAGCCATGGCGTCAAATCAGGCTGTCAGAATTGTAAAAGAACCCACAA 
GTCATGATAACAAGGATATTTGCAAATCTGATGCTGGAAACCTGAATTTTGAATTTCAGGTTG 
CAAAGAAAGAAGGGTCTTGGTGGCATTGTAACAGCTGCTCATTAAAGAATGCTTCAACTGCTA 
AGAAATGTGTATCATGCCAAAATCTAAACCCAAGCAATAAAGAGCTCGTTGGCCCACCATTAG 
CTGAAACTGTTTTTACTCCTAAAACCAGCCCAGAGAATGTTCAAGATCGATTTGCATTGGTGA 
CTCCAAAG AAAGAAGGTCACTGGG ATTGTAGTAT TTGTTTAGTAAG AAATGAAC C T ACTGT AT 
CTAGGTGCATTGCGTGTCAGAATACAAAATCTGCTAACAAAAGTGGATCTTCATTTGTTCATC 
AAGCTTCATTTAAATTTGGCCAGGGAGATCTTCCTAAACCTATTAACAGTGATTTCAGATCTG 
TTTTTTCTACAAAGGAAGGACAGTGGGATTGCAGTGCATGTTTGGTACAAAATGAGGGGAGCT 
CTACAAAATGTGCTGCTTGTCAGAATCCGAGAAAACAGAGTCTACCTGCTACTTCTATTCCAA 
CACCTGCCTCTTTTAAGTTTGGTACTTCAGAGACAAGTAAAACTCTAAAAAGTGGATTTGAAG 
ACATGTTTGCTAAGAAGGAAGGACAGTGGGATTGCAGTTCATGCTTAGTGCGAAATGAAGCAA 
ATGCTACAAGATGTGTTGCTTGTCAGAATCCGGATAAACCAAGTCCATCTACTTCTGTTCCAG 
CTCCTGCCTCTTTTAAGTTTGGTACTTCAGAGACAAGCAAGGCTCCAAAGAGCGGATTTGAGG 
GAATGTTCACTAAGAAGGAGGGACAGTGGGATTGCAGTGTGTGCTTAGTAAGAAATGAAGCCA 
GTGCTACCAAATGTATTGCTTGTCAGAATCCAGGTAAACAAAATCAAACTACTTCTGCAGTTT 
CAACACCTGCCTCTTCAGAGACAAGCAAGGCTCCAAAGAGCGGATTTGAGGGAATGTTCACTA 
AGAAGGAGGGACAGTGGGATTGCAGTGTGTGCTTAGTAAGAAATGAAGCCAGTGCTACCAAAT 
GT ATTGCTTG TCAG AATCC AGGT AAAC AAAATC AAACT ACTTCTGC AGTT TC AAC ACC TG C CT 
CTTCAGAGACAAGCAAGGCTCCAAAGAGCGGATTTGAGGGAATGTTCACTAAGAAGGAAGGAC 
AG TGGG ATTG C AG TGTGTG CTT AG T AAG AAATG AAG C C AGTG CT ACC AAATGT ATTGCTTGTC 
AGTGTCCAAGTAAACAAAATCAAACAACTGCAATTTCAACACCTGCCTCTTCGGAGATAAGCA 
AGGCTCCAAAGAGTGGATTTGAAGGAATGTTCATCAGGAAAGGACAGTGGGATTGTAGTGTTT 
GCTGTGTACAAAATGAGAGTTCTTCCTTAAAATGTGTGGCTTGTGATGCCTCTAAACCAACTC 
ATAAACCTATTGCAGAAGCTCCTTCAGCTTTCACACTGGGCTCAGAAATGAAGTTGCATGACT 
CTTCTGGAAGTCAGGTGGGAACAGGATTTAAAAGTAATTTCTCAGAAAAAGCTTCTAAGTTTG 
GCAATACAGAGCAAGGATTCAAATTTGGGCATGTGGATCAAGAAAATTCACCTTCATTTATGT 
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TTCAGGGTTCTTCTAATACAGAATTTAAGTCAACCAAAGAAGGATTTTCCATCCCTGTGTCTG 
CTGATGGATTTAAATTTGGCATTTCGGAACCAGGAAATCAAGAAAAGAAAAGTGAAAAGCCTC 
TTGAAAATGGTACTGGCTTCCAGGCTCAGGATATTAGTGGCCAGAAGAATGGCCGTGGTGTGA 
TTTTTGGCCAAACAAGTAGCACTTTTACATTTGCAGATCTTGCAAAATCAACTTCAGGAGAAG 
GATTTCAGTTTGGCAAAAAAGACCCCAATTTCAAGGGATTTTCAGGTGCTGGAGAAAAATTAT 
T CTC ATC AC AAT ACG GT AAAATGGCC AAT AAAGC AAAC ACT TC CGGTG ACT TTG AG AAAG ATG 
ATGATGCCTATAAGACTGAGGACAGCGATGACATCCATTTTGAACCAGTAGTTCAAATGCCCG 
AAAAAGTAGAACTTGTAACAGGAGAAGAAGATGAAAAAGTTCTGTATTCACAGCGGGTAAAAC 
TATTTAGATTTGATGCTGAGGTAAGTCAGTGGAAAGAAAGGGGCTTGGGGAACTTAAAAATTC 
TCAAAAACGAGGTCAATGGCAAACTAAGAATGCTGATGCGAAGAGAACAAGTACTAAAAGTGT 
GTGCTAATCATTGGATAACGACTACGATGAACCTGAAGCCTCTCTCTGGATCAGATAGAGCAT 
GGATGTGGTTAGCCAGTGATTTCTCTGATGGTGATGCCAAACTAGAGCAGTTGGCAGCAAAAT 
TTAAAACACCAGAGCTGGCTGAAGAATTCAAGCAGAAATTTGAGGAATGCCAGCGGCTTCTGT 
TAGACATACCACTTCAAACTCCCCATAAACTTGTAGATACTGGCAGAGCTGCCAAGTTAATAC 
AG AG AG CTG AAG AAATG AAGAGTGG ACTG AAAG ATTT C AAAAC ATTTTTG AC AAATG ATC AAA 
CAAAAGTCACTGAGGAAGAAAATAAGGGTTCAGGTACAGGTGCGGCCGGTGCCTCAGACACAA 
CAATAAAACCCAATCCTGAAAACACTGGGCCCACATTAGAATGGGATAACTATGATTTAAGGG 
AAGATGCTTTGGATGATAGTGTCAGTAGTAGCTCAGTACATGCTTCTCCATTGGCAAGTAGCC 
CTGTGAGAAAAAATCTTTTCCGTTTTGGTGAGTCAACAACAGGATTTAACTTCAGTTTTAAAT 
CTGCTTTGAGTCCATCTAAGTCTCCTGCCAAGTTGAATCAGAGTGGGACTTCAGTTGGCACTG 
ATGAAGAATCTGATGTTACTCAAGAAGAAGAGAGAGATGGACAGTACTTTGAACCTGTTGTTC 
CTTTACCTGATCTAGTTGAAGTATCCAGTGGTGAGGAAAATGAACAAGTTGTTTTTAGTCACA 
GGGCAAAACTCTACAGATATGATAAAGATGTTGGTCAATGGAAAGAAAGGGGCATTGGTGATA 
TAAAGATTTTACAGAATTATGATAATAAGCAAGTTCGTATAGTGATGAGAAGGGACCAAGTAT 
T AAAACT TTGTGCC AATC AC AG AAT AACTCC AG AC AT G ACTTTG C AAAAT ATG AAAGGG AC AG 
AAAGAGTATGGTTGTGGACTGCATGTGATTTTGCAGATGGAGAAAGAAAAGTAGAGCATTTAG 
CTGTTCGTTTTAAACTACAGGATGTTGCAGACTCGTTTAAGAAAATTTTTGATGAAGCAAAAA 
CAGCCCAGGAAAAAGATTCTTTGATAACACCTCATGTTTCTCGGTCAAGCACTCCCAGAGAGT 
CACCATGTGGCAAAATTGCTGTAGCTGTATTAGAAGAAACCACAAGAGAGAGGACAGATGTTA 
TTCAGGGTGATGATGTAGCAGATGCAACTTCAGAAGTTGAAGTGTCTAGCACATCTGAAACAA 
CACCAAAAGCAGTGGTTTCTCCTCCAAAGTTTGTATTTGGTTCAGAGTCTGTTAAAAGCATTT 
TTAGTAGTGAAAAATCAAAACCATTTGCATTCGGCAACAGTTCAGCCACTGGGTCTTTGTTTG 
GATTTAGTTTTAATGCACCTTTGAAAAGTAACAATAGTGAAACTAGTTCAGTAGCCCAGAGTG 
GATCTGAAAGCAAAGTGGAACCTAAAAAATGTGAACTGTCAAAGAACTCTGATATCGAACAGT 
CTTCAGATAGCAAAGTCAAAAATCTCTTTGCTTCCTTTCCAACGGAAGAATCTTCAATCAACT 
AC AC ATTT AAAAC AC C AGAAAAGG C AAAAG AG AAG AAAAAACCTG AAG ATTCT CCCTCAG ATG 
ATGATGTTCTCATTGTATATGAACTAACTCCAACCGCTGAGCAGAAAGCCCTTGCAACCAAAC 
TTAAACTTCCTCCAACTTTCTTCTGCTACAAGAATAGACCAGATTATGTTAGTGAAGAAGAGG 
AGGATGATGAAGATTTCGAAACAGCTGTCAAGAAACTTAATGGAAAACTATATTTGGATGGCT 
CAGAAAAATGTAGACCCTTGGAAGAAAATACAGCAGATAATGAGAAAGAATGTATTATTGTTT 
GGGAAAAGAAACCAACAGTTGAAGAGAAGGCAAAAGCAGATACGTTAAAACTTCCACCTACAT 
TTTTTTGTGGAGTCTGTAGTGATACTGATGAAGACAATGGAAATGGGGAAGACTTTCAATCAG 
AGCTTCAAAAAGTTCAGGAAGCTCAAAAATCTCAGACAGAAGAAATAACTAGCACAACTGACA 
GTGTATATACAGGTGGGACTGAAGTGATGGTACCTTCTTTCTGTAAATCTGAAGAACCTGATT 
CTATTACCAAATCCATTAGTTCACCATCTGTTTCCTCTGAAACTATGGACAAACCTGTAGATT 
TGTCAACTAGAAAGGAAATTGATACAGATTCTACAAGCCAAGGGGAAAGCAAGATAGTTTCAT 
TTGGATTTGGAAGTAGCACAGGGCTCTCATTTGCAGACTTGGCTTCCAGTAATTCTGGAGATT 
TTGCTTTTGGTTCTAAAGATAAAAATTTCCAATGGGCAAATACTGGAGCAGCTGTGTTTGGAA 
CACAGTCAGTCGGAACCCAGTCAGCCGGTAAAGTTGGTGAAGATGAAGATGGTAGTGATGAAG 
AAGTAGTTCATAATGAAGATATCCATTTTGAACCAATAGTGTCACTACCAGAGGTAGAAGTAA 
AATCTGG AG AAGAAG ATG AAG AAATTTTGTT T AAAG AG AGAG C CAAACTTTAT AG ATGGG ATC 
GGGATGTCAGTCAGTGGAAGGAGCGCGGTGTTGGAGATATAAAGATTCTTTGGCATACAATGA 
AGAATTATTACCGGATCCTAATGAGAAGAGACCAGGTTTTTAAAGTGTGTGCAAACCACGTTA 
TTACTAAAACAATGGAATTAAAGCCCTTAAATGTTTCAAATAATGCTTTAGTTTGGACTGCCT 
CAGATTATGCTGATGGAGAAGCAAAAGTAGAACAGCTTGCAGTGAGATTTAAAACTAAAGAAG 
TAGCTGATTGTTTCAAGAAAACATTTGAAGAATGTCAGCAGAATTTAATGAAACTCCAGAAAG 
GACATGTATCACTGGCAGCAGAATTATCAAAGGAGACCAATCCTGTGGTGTTTTTTGATGTTT 
GTGCGGACGGTGAACCTCTAGGGCGGATAACTATGGAATTATTTTCAAACATTGTTCCTCGGA 
CTGCTGAGAACTTCAGAGCACTATGCACTGGAGAGAAAGGCTTTGGTTTCAAGAATTCCATTT 
TTCACAGAGTAATTCCAGATTTTGTTTGCCAAGGAGGAGATATCACCAAACATGATGGAACAG 
GCGGACAGTCCATTTATGGAGACAAATTTGAAGATGAAAATTTTGATGTGAAACATACTGGTC 
CTGGTTTACTATCCATGGCCAATCAAGGCCAGAATACCAATAATTCTCAATTTGTTATAACAC 
TGAAGAAAGCAGAACATTTGGACTTTAAGCATGTAGTATTTGGGTTTGTTAAGGATGGCATGG 
AT ACTGTG AAAAAG ATTG AAT CATTTGGTTCTCCCAAAGGGTCTG TTTG TCG AAG AAT AAC T A 
TCACAGAATGTGGACAGATATAA AATCATTGTTGTTCATAGAAAATTTCATCTGTATAAGCAG 
TTGGATTGAAGCTTAGCTATTACAATTTGATAGTTATGTTCAGCTTTTGAAAATGGACGTTTC 
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CGATTTACAAATGTAAAATTGCAGCTTATAGCTGTTGTCACTTTTTAATGTGTTATAATTGAC 


CTTGCATGGTGTGAAATAAAAGTTTAAACACTGGTGTATTTCAGGTGTACTTGTGTTTATGTA 


CTCCTGACGTATTAAAATGGAATAATACTAATCTTGTTAAAAGCAATAGACCTCAAACTATTG 


AAVjoAAJ. Al OA1 Ai AlbLAAl 1 1 AATTTTAATTCCTTTTAAGATATTTGGACTTCCTGCATGG 


ATATACTTACCATTTGAATAAAGGGACCACAACTTGGATAATTTAATTTTAGGTTTGAAATAT 


ATTTGGTAATCTTAACTATTGGTGTACTCATTTATGCATAGAGACTCGTTTATGAATGGGTAG 




AGCCACAGAACGTATAGAGTTAACCAAAGTGCTCTTCTCTAGAATCTTTACACCTCCTGTGTG 


GTTACAAGTTAACTTTGTAAGTAGCGTACCTTCCTTCCTTAAAATATCTAGCTTCCTGTGrrr 


TTTCATAGATATTCGATTAATTTTTACATTTTAAACAAGTTGACTATTTCCTTTAGGGGTTTT 


GTTTCAAACTTTTCTGTCATCTGTCTCTACTACCTCAGAAACTGCAGCTTGGTTCTGATGATA 


GAAATTGAATTTTTCCTTGTAGTTATTGTGATAAAGTATGAATATTTTTAGAAAGTCTATACC 


ATGTTCTTTCGTTAAAGATTTGCTTTATACAAGATTGTTGCAGTACCTTTTTCTGGTAAATTT 


TGTAGCAGAAATAAAATGACAATTCCTAAG 




ORF Start: ATG at 114 |ORF Stop: TAA at 9786 




SEQ ID NO: 326 3224 aa jMWat 358214.5RD 


NOV90a, 

CG94377-01 Protein 
Sequence 

i 
\ 

| 

i 

! 
\ 

! 


MRRSKADVERYIASVQGSTPSPRQKSMKGFYFAKLYYEAKEYDLAKKYICTYINVQERDPKAH 
RFLGLLYELEEOTDKAVECYRRSVELNPTQKDLVLKIAELLCKNDVTDGRAKYWLERAAKLFP 
GSPAIYKLKEQLLDCEGEDGWNKLFDLIQSELYVRPDDVHVNIRLVEVYRSTKRLKDAVAHCH 
EAERNIALRSSLEWNSCWQTLKEYLESLQCLESDKSDWRATNTDLLLAYANLMLLTLSTRDV 
QESRELLQSFDSALQSVKSLGGNDELSATFLEMKGHFYMHAGSLLLKMGQHSSNVQWRALSEL 
AALCYLIAFQVPRPKIKLIKGEAGQNLLEMMACDRLSQSGHMLLNLSRGKQDFLKEIVETFAN 
KSGQSALYDALFSSQSPKDTSFLGSDDIGNIDVREPELEDLTRYDVGAIRAHNGSLQHLTWLG 
LQWNSLPALPGIRKWLKQLFHHLPHETSRLETNAPESICILDLEVFLLGVVYTSHLQLKEKCN 
SHHSSYQPLCLPLPVCKQLCTERQKSWWDAVCTLIHRKAVPGNVAKLRLLVQHEINTLRAQEK 
HGLQPALLVHWAECLQKTGSGLNSFYDQREYIGRSVHYWKKVLPLLKIIKKKNSIPEPIDPLF 
KHFHS VDIQASEI VEYEEDAHITFAILDAVNGNIEDAVTAFES I KS WS YWNLALI FHRKAED 
IENDALSPEEQEECKNYLRKTRDYLIKIIDDSDSNLSWKKLPVPLESVKEMLNSVMQELEDY 
SEGGPLYKNGSLRNADSEIKRSTPSPTRYSLSPSKSYKYSPKTPPRWAEDQNSLLKMICQQVE 
AIKKEMQELKLNSSNSASPHRWPTENYGPDSVPDGYQGSQTFHGAPLTVATTGPSVYYSQSPA 
YNSQYLLRPAANVTPTKGPVYGMNRLPPQQHIYAYPQQMHTPPVQSSSACMFSQEMYGPPALR 
FESPATGILSPRGDDYFNYNVQQTSTNPPLPEPGYFTKPPIAAHASRSAESKTIEFGKTNFVQ 
PMPGEGLRPSLPTQAHTTQPTPFKFNSNFKSNDGDFTFSSPQWTQPPPAAYSNSESLLGLLT 
SDKPLQGDGYSGAKPIPGGQTIGPRNTFNFGSKNVSGISFTENMGSSQQKNSGFRRSDDMFTF 
HGPGKSVFGTPTLETANKNHETDGGSAHGDDDDDGPHFEPWPLPDKIEVKTGEEDEEEFFCN 
RAKLFRFDVESKEWKERGIGNVKILRHKTSGKIRLLMRREQVLKICANHYISPDMKLTPNAGS 
DR S FVWHALD YADELPKP EQLAI RFKT PEEAALFKC K FE EAQS I LKAPGTNVAMASNQ AVRI V 
KEPTSHDNKDICKSDAGNLNFEFQVAKKEGSWWHCNSCSLKNASTAKKCVSCQNLNPSNKELV 
GPPLAETVFTPKTSPENVQDRFALVTPKKEGHWDCSICLVRNEPTVSRCIACQNTKSANKSGS 
SFVHQASFKFGQGDLPKPINSDFRSVFSTKEGQWDCSACLVQNEGSSTKCAACQNPRKQSLPA 
TSIPTPASFKFGTSETSKTLKSGFEDMFAKKEGQWDCSSCLVRNEANATRCVACQNPDKPSPS 
TSVPAPASFKFGTSETSKAPKSGFEGMFTKKEGQWDCSVCLVRNEASATKCIACQNPGKQNQT 
TSAVSTPASSETSKAPKSGFEGMFTKKEGQWDCSVCLVRNEASATKCIACQNPGKQNQTTSAV 
STPASSETSKAPKSGFEGMFTKKEGQWDCSVCLVRNEASATKCIACQCPSKQNQTTAISTPAS 
SEISKAPKSGFEGMFIRKGQWDCSVCCVQNESSSLKCVACDASKPTHKPIAEAPSAFTLGSEM 
KLHDSSGSQVGTGFKSNFSEKASKFGNTEQGFKFGHVDQENSPSFMFQGSSNTEFKSTKEGFS 
IPVSADGFKFGISEPGNQEKKSEKPLENGTGFQAQDISGQKNGRGVIFGQTSSTFTFADLAKS 
TSGEGFQFGKKDPNFKGFSGAGEKLFSSQYGKMANKANTSGDFEKDDDAYKTEDSDDIHFEPV 
VQMPEKVELVTGEEDEKVLYSQRVKLFRFDAEVSQWKERGLGNLKILKNEVNGKLRMLMRREQ 
VLKVCANHWITTTMNLKPLSGSDRAWMWLASDFSDGDAKLEQLAAKFKTPELAEEFKQKFEEC 
QRLLLDIPLQTPHKLVDTGRAAKLIQRAEEMKSGLKDFKTFLTNDQTKVTEEENKGSGTGAAG 
ASDTTIKPNPENTGPTLEWDNYDLREDALDDSVSSSSVHASPLASSPVRKNLFRFGESTTGFN 
FSFKSALSPSKSPAKLNQSGTSVGTDEESDVTQEEERDGQYFEPWPLPDLVEVSSGEENEQV 
VFSHRAKLYRYDKDVGOWKERGTGDTKTT.nMYnNFfOVPT VMPPnnVT WT riMHUTToniuiTT r\M 

MKGTERVWLWTACDFADGERKVEHLAVRFKLQDVADSFKKIFDEAKTAQEKDSLITPHVSRSS 
TPRESPCGKIAVAVLEETTRERTDVIQGDDVADATSEVEVSSTSETTPKAWSPPKFVFGSES 
VKSIFSSEKSKPFAFGNSSATGSLFGFSFNAPLKSNNSETSSVAQSGSESKVEPKKCELSKNS 
DIEQSSDSKVKNLFASFPTEESSINYTFKTPEKAKEKKKPEDSPSDDDVLIVYELTPTAEQKA 
LATKLKLPPTFFCYKNRPDYVSEEEEDDEDFETAVKKLNGKLYLDGSEKCRPLEENTADNEKE 
CIIVWEKKPTVEEKAKADTLKLPPTFFCGVCSDTDEDNGNGEDFQSELQKVQEAQKSQTEEIT 
STTDSVYTGGTEVMVPSFCKSEEPDSITKSISSPSVSSETMDKPVDLSTRKEIDTDSTSQGES 
KIVSFGFGSSTGLSFADLASSNSGDFAFGSKDKNFQWANTGAAVFGTQSVGTQSAGKVGEDED 
3SDEEWHNEDIHFEPIVSLPEVEVKSGEEDEEILFKERAKLYRWDRDVSQWKERGVGDIKIL 
rfHTMKNYYRILMRRDQVFKVCANHVITKTMELKPLOT 

KTKEVADCFKKTFEECQQNLMKLQKGHVSLAAELSKETNPVVFFDVCADGEPLGRITMELFSN 
IVPRTAENFRALCTGEKGFGFKNSIFHRVIPDFVCQGGDITKHDGTGGQSIYGDKFEDENFDV 
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KHTGPGLLSMANQGQNTNNSQFVITLKKAEHLDFKHWFGFVKDGMDTVKKIESFGSPKGSVC 
RRITITECGQI 




SEQIDNO:327 J 


5332 bp 




NOV90b, 
CG94377-02 DNA 
Sequence 


ACGCGTCTCGGGAGCCAGGTTGGCGGCGCGATGAGGCGCAGCAAGGCCGATGTGGAGCGGTAC 
GTCGCCTCGGTGCTGGGTCTCACCCCGTCGCCTCGACAGAAGTCAATGAAAGGATTCTATTTT 
GC AAAGCTGTATTATGAAGCTAAAGAATATGATCTTGCTAAAAAGTACGTATGTACTTACCTT 
AGTGTGCAAGAGAGGGATCCCAGAGCTCACAGATTTCTGGGTCTTCTTTATGAATTGGAAGAA 
AACACAGAGAAAGCCGTTGAATGTTACAGGCGTTCACTGGAATTAAACCCACCACAAAAAGAT 
CTTGTGTTGAAGATTGCAGAATTGCTTTGTAAAAATGATGTTACTGATGGAAGAGCAAAATAC 
TGGGTTGAAAGGGCAGCGAAACTTTTCCCAGGAAGTCCTGCAATTTATAAACTAAAGCATCTT 
CT AG ATTGTG AAGGTG AAG ATGG ATGG AAT AAACTTTTTG AC TGG ATT CAGTC AG AACTTT AT 
GTAAGACCTGATGACGTCCATATGAACATCCGGCTAGTGGAGTTGTATCGCTCAAATAAAAGA 
TTGAAGGATGCTGTGGCCCGCTGCCATGAGGCAGAGAGGAACATAGCTTTGCGTTCAAGTTTA 
GAGTGGAATTCGTGTGTTGTACAGACCCTTAAGGAATATCTGGAGTCTTTACAGTGTTTGGAG 
TCTGATAAAAGTGACTGGCGAGCAACCAATACAGACTTACTGCTGGCCTATGCTAATCTTATG 
CTTCTTACGCTTTCCACTAGAGATGTGCAGGAAAGTAGAGAATTACTGGAAAGTTTTGATAGT 
GCTCTTCAGTCTGCAAAATCTTCTTTGGGTGGAAATGATGAACTGTCAGCTACTTTCTTAGAA 
ATGAAAGGACATTTCTACATGCATGCTGGTTCTCTGCTCTTGAAGATGGGTCAGCATGGTAAT 
AATGTTCAATGGCAAGCTCTTTCTGAGCTGGCTGCATTGTGCTATGTCATAGCATTTCAGGTT 
CCAAGACCAAAGATTAAATTAATAAAAGGTGAAGCTGGACAAAATCTGCTGGAAATGATGGCC 
TGTGACCGACTGAGCCAATCAGGGCATATGTTGCTAAACTTAAGTCGTGGCAAGCAAGATTTT 
TTAAAAGAGGTTGTTGAAACTTTTGCCAACAAAAGCGGGCAGTCTGTGTTATATAATGCTCTG 
TTTTCTAGTCAGTCATCTAAGGATACATCTTTTCTTGGTAGCGATGATATTGGAAACATTGAT 
GTACAAGAACCAGAGCTTGAAGATTTGGCTAGATACGATGTTGGTGCTATTCAAGCACATAAT 
GGTAGTCTTCAGCACCTTACTTGGCTTGGCTTACAGTGGAATTCATTGCCTGCTTTACCTGGA 
ATCCGAAAATGGCTAAAACAGCTTTTCCATCATTTGCCCCAGGAAACCTCAAGGCTTGAAACA 
AATGCACCTGAATCAATATGTATTTTAGATCTTGAAGTATTTCTCCTTGGAGTAGTATATACC 
AGCCACTTACAATTAAAGGAGAAATGTAATTCTCACCACAGCTCCTATCAGCCGTTATGCCTG 
CCCCTTCCTGTGTGTAAACGGCTTTGTACAGAGAGACAAAAATCTTGGTGGGATGCGGTTTGT 
ACTCTGATTCACAGAAAAGCAGTGAACTCAGCAGAATTGAGACTTGTAGTTCAGCATGAAATA 
AAC ACTC T AAG AG C CC AGG AAAAAC ATGGCCTTCAACCTGCTCTG CT TGT AC ATTGGG CAAAA 
TGCCTTCAGAAAGGCAGGGGTCTTAATTCTTCTTATGATCAACAAGAATACATAGGGAGAAGT 
GTTCATTATTGGAAGAAAGTTTTGCCATTGTTGAAGATAATAAAGAAGAACAGTATTCCTGAA 
CCTATTGATCCTCTGTTTAAACATTTTCATAGTGTAGACATTCAGGCATCAGAAATTGTTGAG 
TATGAAGAAGATGCACACATAACTTTTGCTATATTGGATGCAGTACATGGAAATATAGAAGAT 
GCTGTGACTGCTTTTGAATCTATAAAAAGTGTTGTTTCTTATTGGAATCTTGCACTGATTTTT 
CACAGGAAAGCAGAAGACATTGAAAATGATGCCGTTTTTCCTGAAGAACAAGAAGAATGCAAA 
AATTATCTGAGAAAGACCAGGGACTACCTAATAAAGATTATAGATGACAGTGATTCAAATCTT 
TCAGTGGTCAAGAAAGTAAGTGTGCCCCTGGAGTCTGTAAAAGAGATGCTTAAGTCAGTCATG 
CAGGAACTCGAAGACTATAGTGAAGGAGGTCCTCTCTATAAAAATGGTTCTTTGCGAAATGCA 
GATTCAGAAATAAAACATTCTACACCATCTCCTACCAAATATTCACTATCACCAAGTAAAAGT 
TACAAGTATTCTCCCAAAACACCACCTCGATGGGCAGAAGATCAGAATTCTTTACGGAAAATG 
ATTTGCCAAGAAGTAAAGGCCATTAAGAAAGAAATGCAGGAGTTGAAACTAAATAGCAGTAAG 
TCAGCATCCCGTCATCGTTGGCCCACAGAGAATTATGGACCAGACTCGGTGCCTGATGGATAT 
CAGGGGTCACAGACATTTCATGGGGCTCCACTAACAGTTGCAACTACTGGCCCTTCAGTATAT 
TATAGTCAGTCACCAGCATATAATTCCCAGTATCTTCTCAGACCAGCAGCTAATGTTACTCCC 
AC AAAGGGTTCTTCT AAT AC AGAATTT AAGT C AACC AAAG AAGG ATTTTC C AT CC CTGTGT CT 
GCTGATGGATTTAAATTTGGCATTTCGGAACCAGGAAATCAAGAAAAGAAAAGTGAAAAGCCT 
CTTGAAAATG ATACTGGCTTCCAGGCTCAGG AT ATT AGTGGC CAG AAG AATGG CCGTGGTG TG 
ATTTTTGGCCAAACAAGTAGCACTTTTACATTTGCAGATGTTGCAAAATCAACTTCAGGAGAA 
GGATTTCAGTTTG GCAAAAAAGACCCCAATTT C AAGGG ATTTTC AGG TGCTGG AG AAAAATTA 
TT CTC ATC AC AATGCGGT AAAATGGC CAATAAAGCAAAC ACTTC CGG TG ACTTTG AG AAAG AT 
GATGATGCCTGTAAGACTGAGGACAGCGATGACATCCATTTTGAACCAGTAGTTCAAATGCCT 
GAAAAAGTAGAACTTGTAACAGGAGAAGAAGGTGAAAAAGTTCTGTATTCACAGGGGGTAAAA 
CTATTTAGATTTGATGCTGAGATAAGTCAGTGGAAAGAAAGGGGCTTGGGGAACTTAAAAATT 
CTCAAAAATGAGGTCAATGGCAAACCAAGAATGCTGATGCGAAGAGACCAAGTACTAAAAGTG 
TGTGCTAATCATTGGATAACAACTACAATGAACCTGAAGCCCCTCTCTGGATCAGATAGAGCA 
TGGATGTGGTTAGCCAGTGATTTCTCTGATGGTGATGCCAAACTAGAGCGGTTGGCAGCACAA 
TTTAAAACACCAGAGCTGGCTGAAGAATTCAAGCAGAAATTTGAGGAATGCCAGCGGCTTCTG 
TTAGACATACCACTTCAAACTCCCCATAAACTTGTAGATACTGGCAGAGCTGCCAAGCTAATA 
CAGAGAGCTGAAGAAATGAAGAGTGGACTGAAAGATTTCAAAACGTTTTTGACAAATGATCAA 
ACAAAAGTCACTGAGGAAGAAAATAAGGGTTCAGGTACAGGTGCAGCCGGTGCCTCAGACACA 
ACAATAAAACCCAATCCTGAAAACACTGGGCCCACATTAGAATGGGATAACTATGATTTAAGG 
GAAGATGCTTTGGATGATAATGTTAGTAGTAGCTCAGTACATGATTCTCCGTTGGCAAGTAGC 
CCTGTGAGAAAAAATATTTTCCGCTTTGATGAGTCAACAACAGGATTTAACTTCAGTTTTAAA 
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TCTGCTTTGAGTCTATCTAAGTCTCCTGCCAAGTTGAATCAGAGTGGGACTTCAGTTGGCACT 
GATGAAGAATCTGATGTTACTCAAGAAGAAGAGAGAGATGGACAGTACTTTGAACCTGTTGTT 
CCTTTACCTGATCTAGTTGAAGTATCCAGTGGTGAGGAAAATGAACAAGTTGTTTTTAGTCAC 
ATGGCAGAACTCTACAGATATGATAAAGATGTTGGTCAATGGAAAGAAAGGGGCATTGGTGAT 
ATAAAGATTTTACAGAATTATGATAATAAGCAAGTTCGTATAGTGATGAGAAGGGACCAAGTA 
TTAAAACTTTGTGCCAATCACAGAATAACTCCAGACATGAGTTTGCAAAATATGAAAGGGACA 
GAAAGAGTATGGGTGTGGACTGCATGTGATTTTGCAGATGGAGAAAGAAAAGTAGAGCATTTA 
GCTGTTCGTTTTAAACTACAGGATGTTGCAGACTCATTTAAGAAAATTTTTGATGAAGCAAAA 
ACAGCCCAGGAAAAAGATTCTTTGATAACACCTCATGTTTCTCGGTCAAGCACTCCCAGAGAG 
TCACCATGTGGCAAAATTGCTGTAGCTGTATTAGAAGAAACCACAAGAGAGAGGACAGATGTT 
ATTCAGGGTGATGATGTAGCAGATGCAGCTTCAGAAGTTGAAGTGTCTAGCACATCTGAAACA 
ACAACAAAAGCAGTGGTTTCTCCTCCAAAGTTTGTATTTGGTTCAGAGTCTGTTAAAAGAATT 
TTTAGTAGTGAAAAATCAAACCCATTTGCATTTGGCAACAGTTCTGCCACTGGGTCTTTGTTT 
GGATTTAGTTTTAATGCACCTTTGAAAAGTAACGATAGTGAAACTAGTTCAGTAGCCCAGAGT 
GGATCTGAAAGCAAAGTGGAACCTAAAAAATGTGAACTGTCAAAGAACTCTGATATCGAACAG 
TCTTCAGATAGCAAAGTCAAAAATCTCTCTGCTTCCTTTCCAATGGAAGAATCTTCAATCAAC 
TACACATTTAAAACACCAGAAAAGGAGCCTCCATTATGGCATGCTGAATTTACCAAAGAAGAA 
TTGGTTCAGAAGCTCAGTTCCACCACAAAAAGTGCAGATCAGTTAAACGGCCTGCTTCGGGAA 
ACAGAGGC AACC AGTG CAGTC CTT AT GG AGC AAATTAAG CTTCTCAAAAGTGAAAT AAGAAG A 
TTGGAAAGGAATCAAGAGGAGTCTGCAGCTAACGTGGAACACTTGAAGAACGTCTTGCTGCAG 
TTCATTTTCTTGAAGCCAGGTAGTGAGAGAGAGAGCCTTCTTCCTGTTATAAATACGATGTTG 
CAGCTCAGCCCTGAAGAAAAGGGAAAACTTGCTGCGGTTGCTCAAGGTCTTCAACAAACCTCC 
ATACCCAAGAAAAAATAGAAAGCACCATGTTCTACTATGG 




ORF Start: at I j 


ORF Stop: TAG at 5308 




SEQIDNO:328 jl769aa 


MWat 198802.2kD 


NOV90b, 

CG94377-02 Protein 
Sequence 

1 

: 
t 

i 
i 


TRLGSQVGGAMRRSKADVERYVASVLGLTPSPRQKSMKGFYFAKLYYEAKEYDLAKKYVCTYL 
SVQERDPRAHRFLGLLYELEENTEKAVECYRRSLELNPPQKDLVLKIAELLCKNDVTDGRAKY 
WVERAAKLFPGSPAIYKLKHLLDCEGEDGWNKLFDWIQSELYVRPDDVHMNIRLVELYRSNKR 
LKDAVARCHEAERNIALRSSLEWNSCWQTLKEYLESLQCLESDKSDWRATNTDLLLAYANLM 
LLTLSTRDVQESRELLESFTDSALQSAKSSLGGNDELSATFLEMKGHFYMHAGSLLLKMGQHGN 
NVQWQALSELAALCYVIAFQVPRPKIKLIKGEAGQNLLEMMACDRLSQSGHMLLNLSRGKQDF 
LKEWETFANKSGQSVLYNALFSSQSSKDTSFLGSDDIGNIDVQEPELEDLARYDVGAIQAHN 
GSLQHLTWLGLQWNSLPALPGIRKWLKQLFHHLPQETSRLETNAPESICILDLEVFLLGWYT 
SHLQLKEKCNSHHSSYQPLCLPLPVCKRLCTERQKSWWDAVCTLIHRKAVNSAELRLWQHEI 
NTLRAQEKHGLQPALLVHWAKCLQKGRGLNSSYDQQEYIGRSVHYWKKVLPLLKIIKKNSIPE 
PIDPLFKHFHSVDIQASEIVEYEEDAHITFAILDAVHGNIEDAVTAFESIKSWSYWNLALIF 
HRKAEDIENDAVFPEEQEECKNYLRKTRDYLIKIIDDSDSNLSWKKVSVPLESVKEMLKSVM 
QELEDYSEGGPLYKNGSLRNADSEIKHSTPSPTKYSLSPSKSYKYSPKTPPRWAEDQNSLRKM 
ICQEVKAIKKEMQELKLNSSKSASRHRWPTENYGPDSVPDGYQGSQTFHGAPLTVATTGPSVY 
YSQSPAYNSQYLLRPAANVTPTKGSSNTEFKSTKEGFSIPVSADGFKFGISEPGNQEKKSEKP 
LENDTGFQAQDISGQKNGRGVIFGQTSSTFTFADVAKSTSGEGFQFGKKDPNFKGFSGAGEKL 
FSSQCGKMANKANTSGDFEKDDDACKTEDSDDIHFEPWQMPEKVELVTGEEGEKVLYSQGVK 
LFRFDAEISQWKERGLGNLKILKNEVNGKPRMLMRRDQVLKVCANHWITTTMNLKPLSGSDRA 
WMWLASDFSDGDAKLERLAAQFKTPELAEEFKQKFEECQRLLLDIPLQTPHKLVDTGRAAKLI 
QRAEEMKSGLKDFKTFLTNDQTKVTEEENKGSGTGAAGASDTTIKPNPENTGPTLEWDNYDLR 
EDALDDNVSSSSVHDSPLASSPVRKNIFRFDESTTGFNFSFKSALSLSKSPAKLNQSGTSVGT 
DEESDVTQEEERDGQYFEPWPLPDLVEVSSGEENEQWFSHMAELYRYDKDVGQWKERGIGD 
IKILQNYDNKQVRIVMRRDQVLKLCANHRITPDMSLQNMKGTERVWVWTACDFADGERKVEHL 
AVRFKLQDVADSFKKIFDEAKTAQEKDSLITPHVSRSSTPRESPCGKIAVAVLEETTRERTDV 
IQGDDVADAASEVEVSSTSETTTKAVVSPPKFVFGSESVKRIFSSEKSNPFAFGNSSATGSLF 
GFSFTJAPLKSNDSETSSVAQSGSESKVEPKKCELSKNSDIEQSSDSKVKNLSASFPMEESSIN 
YTFKTPEKEPPLWHAEFTKEELVQKLSSTTKSADQLNGLLRETEATSAVLMEQIKLLKSEIRR 
LERNQEESAANVEHLKNVLLQFIFLKPGSERESLLPVINTMLQLSPEEKGKLAAVAQGLQQTS 
IPKKK 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 90B. 
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Tabic 90B. Comparison of NOV90a against NOV90b. 


Protein Sequence 


* 

NOV90a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV90b 




I. .900 

II. .906 




795/901 (88%) 
823/901 (91%) 



Further analysis of the NOV90a protein yielded the following properties shown in 
Table 90C. 



Table 90C Protein Sequence Properties NOV90a 



PSort 
analysis: 



0.6000 probability located in endoplasmic reticulum (membrane); 0.3000 probability 
located in microbody (peroxisome); 0.2525 probability located in mitochondrial inner 
membrane; 0.1000 probability located in nucleus 



Signal P 
analysis: 



No Known Signal Sequence Predicted 



A search of the NOV90a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 90D. 



Table 90D. Geneseq Results for NOV90a 


Geneseq 
Identifier 


Protein/Organism/Length [Patent 
#,Date] 


NOV90a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Region 


Expect 
Value 


AAW54235 


Human Nup358 protein - Homo 
sapiens, 3224 aa. [WO9809170-A2, 
05-MAR-1998] 


1..3224 
1 ..3224 


3224/3224(100%) 
3224/3224(100%) 


0.0 


AAM03867 


Peptide #2549 encoded by probe for 
measuring breast gene expression - 
Homo sapiens, 164 aa. 
[ WO200 1 57270-A2, 09-A UG-200 1 ] 


I885..2048 
I .164 


157/164 (95%) 
159/164 (96%) 


2e-85 


AAM28631 


Peptide #2668 encoded by probe for 
measuring placental gene expression 
- Homo sapiens, 164 aa. 
[WO200I57272-A2, 09-AUG-2001] 


1885..2048 
i.. 164 


157/164 (95%) 
159/164(96%) 


2e-85 


AAMI6I37 

: 


Peptide #2571 encoded by probe for 
measuring cervical gene expression - 
Homo sapiens, 164 aa. 
[WO200I57278-A2, 09-AUG-2001] 


1885..2048 
1..164 


157/164(95%) 
159/164 (96%) 


2e-85 


AAM68322 j 








2e-85 
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j probe encoded protein SEQ ID NO: 

i 28628 - Homo sapiens, 164 aa. 

j [WO200I57276-A2. 09-AUG-2001] 


I..164 


159/164(96%) 





In a BLAST search of public sequence datbases, the NOV90a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 90E. 



Table 90E. Public BLASTP Results for NOV90a 


Protein 

Accession 

Number 


Protein/Organism/Length 


NOV90a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for the 
Matched Portion 


Expect 
Value 


P49792 


Ran-binding protein 2 (RanBP2) 
(Nuclear pore complex protein 
Nup358) (Nucleoporin Nup358) (358 
kDa nucleoporin) (P270) - Homo 
sapiens (Human), 3224 aa. 


1 ..3224 
1..3224 


3224/3224(100%) 
3224/3224(100%) 


0.0 


S58884 


Ran-binding protein 2 - human, 3224 
aa. 


1..3224 
1..3224 


3222/3224 (99%) 
3223/3224 (99%) 


0.0 


Q9ERU9 


Ran-binding protein 2 - Mus musculus 
(Mouse), 3053 aa. 


1 ..1656 
1..1670 


1406/1687 (83%) 
1505/1687 (88%) 


0.0 


P48820 


Ran-binding protein 2 (RanBP2) 
(Nuclear pore complex protein 
Nup358) (Nucleoporin Nup358) (358 
kDa nucleoporin) (P270) - Bos taurus 
(Bovine), 1085 aa (fragment). 


2136..3224 
1..1085 


944/1090 (86%) 
998/1090 (90%) 


0.0 


Q99666 


Sperm membrane protein BS-63 - 
Homo sapiens (Human), 1765 aa. 


1..900 
I ..900 


853/901 (94%) 
875/901 (96%) 


0.0 



5 



PFam analysis predicts that the NOV90a protein contains the domains shown in the 
Table 90F. 



Table 90F. Domain Analysis of NOV90a 


Pfam Domain 


NOV90a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


TPR 


60..93 


11/34 (32%) 
27/34 (79%) 


7.1e-07 


Ran_BPl 


1183..I304 


87/127(69%) 
120/127(94%) 


3.6e-90 


zf-RanBP 


1 351 ..1381 


15/32(47%) 
27/32 (84%) 


1 .5e-08 
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j zf-RanBP 


1415.. 1444 


15/32(47%) jl.2e-10 
26/32(81%) j 


1 zf-C3HC4 


1485.. 1502 


7/26 (27%) 
16/26(62%) 


0.76 


zf-RanBP 


1479.. 1508 


18/32 (56%) 
28/32 (88%) 


3.3e-l2 


zf-RanBP 


1543.. 1572 


18/32 (56%) 
28/32 (88%) 


8.9e-l2 


zf-RanBP 


1606.. 1635 


17/32 (53%) 
29/32 (91%) 


le-12 


zf-RanBP 


1665.. 1694 


17/32 (53%) 
29/32(91%) 


le-12 


zf-RanBP 

i 


1724.. 1753 " 


17/32(53%) 
28/32 (88%) 


l.le-10 


| zf-RanBP 


I781..I810 


19/32(59%) 
29/32(91%) 


I.7e-I2 


RanBPl 


2024..2145 


86/127 (68%) 
120/127 (94%) 


9.7e-88 


RanBPl 


23 21.. 2442 


77/127 (61%) 
121/127 (95%) 


l.le-85 


Ran_BPI 


2922.. 3043 


84/127 (66%) 
122/127 (96%) 


6.8e-92 


proisomerase 


3065..3224 


100/179 (56%) 
140/179 (78%) 


l.7e-90 



Example 91. 

The NOV91 clone was analyzed, and the nucleotide and encoded polypeptide 
sequences are shown in Table 91 A. 



Table 91A. NOV91 Sequence Analysis 



NOV9la. 
CG97090-01 DNA 
Sequence 



SEQ ID NO: 329 



1908 bp" " f 



ACAGGTGACTTTTCCACAGGAACTTCTGCAATGTCCCATCAACCTCTCAGCTGCTGGAATTCG 



CCCTTATCCTCCCACCTGGATCTCCCAAACCTGGACACATTTACCCCGGAGGAGCTGCTGCAG 
CAGATGAAAGAGCTCCTGACCGAGAACCACCAGCTGAAAGAAGCCATGAAGCTAAATAATCAA 
GCCATGAAAGGGAGATTTGAGGAGCTTTCGGCCTGGACAGAGAAACAGAAGGAAGAACGCCAG 
TTTTTTGAGATACAGAGCAAAGAAGCAAAAGAGCGTCTAATGGCCTTGAGTCATGAGAATGAG 
AAATTGAAGGAAGAGCTTGGAAAACTAAAAGGGAAATCAGAAAGGTCATCTGAGGACCCCACT 
GATGACTCCAGGCTTCCCAGGGCCGAAGCGGAGCAGGAAAAGGACCAGCTCAGGACCCAGGTG 
GTGAGGCTACAAGCAGAGAAGGCAGACCTGTTGGGCATCGTGTCTGAACTGCAGCTCAAGCTG 
AACTCCAGCGGCTCCTCAGAAGATTCCTTTGTTGAAATTAGGATGGCTGAAGGAGAAGCAGAA 
GGGTCAGTAAAAGAAATCAAGCATAGTCCTGGGCCCACGAGAACAGTCTCCACTGGCACGAGC 
AGATCTGCAGATGGGGCCAAGAATTACTTCGAACATGAGGAGTTAACTGTGAGCCAGCTCCTG 
CTGTGCCTAAGGGAAGGGAATCAGAAGGTGGAGAGACTTGAAGTTGCACTCAAGGAGGCCAAA 
GAAAGAGTTTCAGATTTTGAAAAGAAAACAAGTAATCGTTCTGAGATTGAAACCCAGACAGAG 
GGGAGCACAGAGAAAGAGAATGATGAAGAGAAAGGCCCGGAGACTGTTGGAAGCGAAGTGGAA 
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GCACTGAACCTCCAGGTGACATCTCTGTTTAAGGAGCTTCAAGAGGCTCATACAAAACTCAGC 
GAAGCTGAGCTAATGAAGAAGAGACTTCAAGAAAAGTGTCAGGCCCTTGAAAGGAAAAATTCT 
GCAATTCCATCAGAGTTGAATGAAAAGCAAGAGCTTGTTTATACTAACAAAAAGTTAGAGCTA 
CAAGTGGAAAGCATGCTATCAGAAATCAAAATGGAACAGGCTAAAACAGAGGATGAAAAGTCC 
AAAT T AACTG TGCT AC AG ATG AC AC AC AAC AAGCTT CTT C AAG AAC AT AAT AATGC ATTG AAA 
ACAATTGAGGAACTAACAAGAAAAGAGTCAGAAAAAGTGGACAGGGCAGTGCTGAAGGAACTG 
AGTGAAAAACTGGAACTGGCAGAGAAGGCTCTGGCTTCCAAACAGCTGCAAATGGATGAAATG 
AAGC AAAC CATTGC CAAGC AGG AAG AGG ACC TGG AAACC ATG AC C AT CCTC AGGGCTCAG ATG 
GAAGTTTACTGTTCTGATTTTCATGCTGAAAGAGC AG CG AG AG AG AAAATTC ATG AGGAAAAG 
GAGCAACTGGCATTGCAGCTGGCAGTTCTGCTGAAAGAGAATGATGCTTTCGAAGACGGAGGC 
AGGCAGTCCTTGATGGAGATGCAGAGTCGTCATGGGGCGAGAACAAGTGACTCTGACCAGCAG 
GCTTACCTTGTTCAAAGAGGAGCTGAGGACAGGGACTGGCGGCAACAGCGGAATATTCCGATT 
CATTCCTGCCCCAAGTGTGGAGAGGTTCTGCCTGACATAGACACGTTACAGATTCACGTGATG 
GATTGCATCATTTAAGTGTTGATGTATCACCTCCCCAAAACTGTTGGTAAATGTCAGATTTTT 




TCCTCCAAGAGTTGTGCTTTTGTGTTATTTGTTTTCACTCAAATATTTTGCCTCATTATTCTT 




GTTTTAAAAGAAAGAAAACAGGCCGGGCACAGTGGCTCATGCCTGTAATCCCAGCACTTTGGG 




AGATCCAGGTGGGAGGAT 


■ 


ORF Start: ATG at 31 


]ORF Stop: TAA at 1714 




SEQIDNO: 330 


561 aa 


MW at 64267.6kD 


NOV91a, 

CG97090-OI Protein 
Sequence 


MSHQPLSCWNSPLSSHLDLPNLDTFTPEELLQQMKELLTENHQLKEAMKLNNQAMKGRFEELS 
AWTEKQKEERQFFEIQSKEAKERLMALSHENEKLKEELGKLKGKSERSSEDPTDDSRLPRAEA 
EQEKDQLRTQWRLQAEKADLLGIVSELQLKLNSSGSSEDSFVEIRMAEGEAEGSVKEIKHSP 
GPTRTVSTGTSRSADGAKNYFEHEELTVSQLLLCLREGNQKVERLEVALKEAKERVSDFEKKT 
SNRSEIETQTEGSTEKENDEEKGPETVGSEVEALNLQVTSLFKELQEAHTKLSEAELMKKRLQ 
EKCQALERKNSAIPSELNEKQELVYTNKKLELQVESMLSEIKMEQAKTEDEKSKLTVLQMTHN 
KLLQEHNNALKTIEELTRKESEKVDRAVLKELSEKLELAEKALASKQLQMDEMKQTIAKQEED 
LETMTILRAQMEVYCSDFHAERAAREKIHEEKEQLALQLAVLLKENDAFEDGGRQSLMEMQSR 
HGARTSDSDQQAYLVQRGAEDRDWRQQRNIPIHSCPKCGEVLPDIDTLQIHVMDCII 




SEQIDNO: 331 


1858 bp 




NOV91b, 
CG97090-04 DNA 
Sequence 


AT CCT CCCAC CTGG ATCTC CC AAACCTGGAC ACGTTT ACC CCGG AGG AG CTG CTGC AGCAG AT 


GAAAGAGCTCCTGACCGAGAACCACCAGCTGAAAGAAGCCATGAAGCTAAATAATCAAGCCAT 
GAAAGGGAGATTTGAGGAGCTTTCGGCCTGGACAGAGAAACAGAAGGAAGAACGCCAGTTTTT 
TGAGATACAGAGCAAAGAAGCAAAAGAGCGTCTAATGGCCTTGAGTCATGAGAATGAGAAATT 
GAAGGAAGAGCTTGGAAAACTAAAAGGGAAATCAGAAAGGTCATCTGAGGACCCCACTGATGA 
CTCCAGGCTTCCCAGGGCCGAAGCGGAGCAGGAAAAGGACCAGCTCAGGACCCAGGTGGTGAG 
GCTACAAGCAGAGAAGGCAGACCTGTTGGGCATCGTGTCTGAACTGCAGCTCAAGCTGAACTC 
CAGCGGCTCCTCAGAAGATTCCTTTGTTGAAATTAGGATGGCTGAAGGAGAAGCAGAAGGGTC 
AGTAAAAGAAATCAAGCATAGTCCTGGGCCCACGAGAACAGTCTCCACTGGCACGGCATTGTC 
TAAATATAGGAGCAGATCTGCAGATGGGGCCAAGAATTACTTCGAACATGAGGAGTTAACTGT 
GAGCCAGCTCCTGCTGTGCCTAAGGGAAGGGAATCAGAAGGTGGAGAGACTTGAAGTTGCACT 
CAAGGAGGCCAAAGAAAGAGTTTCAGATTTTGAAAAGAAAACAAGTAATCGTTCTGAGATTGA 
AACCC AG AC AG AGGGG AG C AC AGAG AAAG AGAATGATG AAGAGAAAGG CC C GG AG ACTGTTGG 
AAGCGAAGTGGAAGCACTGAACCTCCAGGTGACATCTCTGTTTAAGGAGCTTCAAGAGGCTCA 
TACAAAACTCAGCGAAGCTGAGCTAATGAAGAAGAGACTTCAAGAAAAGTGTCAGGCCCTTGA 
AAGGAAAAATTCTGCAATTCCATCAGAGTTGAATGAAAAGCAAGAGCTTGTTTATACTAACAA 
AAAGTTAGAGCTACAAGTGGAAAGCATGCTATCAGAAATCAAAATGGAACAGGCTAAAACAGA 
GGATGAAAAGTCCAAATTAACTGTGCTACAGATGACACACAACAAGCTTCTTCAAGAACATAA 
TAATGCATTGAAAACAATTGAGGAACTAACAAGAAAAGAGTCAGAAAAAGTGGACAGGGCAGT 
GCTGAAGGAACTGAGTGAAAAACTGGAACTGGCAGAGAAGGCTCTGGCTTCCAAACAGCTGCA 
AATGG ATG AAATGAAGCAAACCATTGCC AAG CAGG AAG AGG ACCTGG AAACC ATG ACC ATCCT 
CAGGGCTCAGATGGAAGTTTACTGTTCTGATTTTCATGCTGAAAGAGCAGCGAGAGAGAAAAT 
TCATGAGGAAAAGGAGCAACTGGCATTGCAGCTGGCAGTTCTGCTGAAAGAGAATGATGCTTT 
CGAAGACGGAGGCAGGCAGTCCTTGATGGAGATGCAGAGTCGTCATGGGGCGAGAACAAGTGA 
CTCTGACCAGCAGGCTTACCTTGTTCAAAGAGGAGCTGAGGACAGGGACTGGCGGCAACAGCG 
G AAT ATT CCG ATT C ATTC CTGCCCC AAGTGTGG AG AGGT TCT GCC TG AC AT AG AC ACG TT AC A 
GATTCACGTGATGGATTGCATCATTTAAGTGTTAATGTATCACCTCCCCAAAACTGTTGGTAA 




ATGTCAGATTTTTTCCTCCAAGAGTTGTGCTTTTGTGTTATTTGTTTTCACTCAAATATTTTG 




CCTCATTATTCTTGTTTTAAAAGAAAGAAAACAGGCCGGGCACAGTGGCTCATGCCTGTAATC 




CC AG C AC TTTGGG AG ATC C AGGTGGG AGG AT 




ORF Start: ATG at 62 


i 




jORF Stop: TAA at 1664 




SEQIDNO: 332 


534 aa 


MWat6l225.3kD 


NOV91b, 


MKELLTENHQLKEAMKLNNQAMKGRFEELSAWTEKQKEERQFFEIQSKEAKERLMALSHENEK 
LKEELGKLKGKSERSSEDPTDDSRLPRAEAEQEKDQLRTQWRLQAEKADLLGIVSELQLKLN 
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CG97090-04 Protein 
Sequence 



SSGSSEDSFVEIRMAEGEAEGSVKEIKHSPGPTRTVSTGTALSKYRSRSADGAKNYFEHEELT 
VSQLLLCLREGNQKVERLEVALKEAKERVSDFEKKTSNRSEIETQTEGSTEKENDEEKGPETV 
GSEVEALNLQVTSLFKELQEAHTKLSEAELMKKRLQEKCQALERKNSAIPSELNEKQELVYTN 
KKLELQVESMLSEIKMEQAKTEDEKSKLTVLQMTHNKLLQEHNNALKTIEELTRKESEKVDRA 
VLKELSEKLELAEKALASKQLQMDEMKQTIAKQEEDLETMTILRAQMEVYCSDFHAERAAREK 
IHEEKEQLALQLAVLLKENDAFEDGGRQSLMEMQSRHGARTSDSDQQAYLVQRGAEDRDWRQQ 
RNIPIHSCPKCGEVLPDIDTLQIHVMDCII 



SEQiDNO: 333 



1857 bp 



NOV91c, 
CG97090-03 DNA 
Sequence 



TGCTGGAATTCGCCCTTATCCTCCCACCTGGATCTCCCAAACCTGGACACATTTACCCCGGAG 



GAGCTGCTGCAGCAGATGAAAGAGCTCCTGACCGAGAACCACCAGCTGAAAGAAGCCATGAAG 



CTAAATAATCAAGCCATGAAAGGGAGATTTGAGGAGCTTTCGGCCTGGACAGAGAAACAGAAG 
GAAGAACGCCAGTTTTTTGAGATACAGAGCAAAGAAGCAAAAGAGCGTCTAATGGCCTTGAGT 
C ATG AG AATG AGAAATTG AAGG AAG AG CTTGG AAAACT AAAAGGG AAATC AG AAAGGT CAT CT 
GAGGACCCCACTGATGACTCCAGGCTTCCCAGGGCCGAAGCGGAGCAGGAAAAGGACCAGCTC 
AGGACCCAGGTGGTGAGGCTACAAGCAGAGAAGGCAGACCTGTTGGGCATCGTGTCTGAACTG 
CAGCTCAAGCTGAACTCCAGCGGCTCCTCAGAAGATTCCTTTGTTGAAATTAGGATGGCTGAA 
GG AG AAG C AG AAGGGTCAGTAAAAGAAATCAAGCAT AGTCCTGGG CC C ACGAG AAC AGTCT CC 
ACTGG C ACG AGCAGATCTGCAGATGGGG C CAAGAATT ACTTC GAACATGAGG AGTT AACTGTG 
AGCCAGCTCCTGCTGTGCCTAAGGGAAGGGAATCAGAAGGTGGAGAGACTTGAAGTTGCACTC 
AAGGAGGCCAAAGAAAGAGTTTCAGATTTTGAAAAGAAAACAAGTAATCGTTCTGAGATTGAA 
ACCC AGAC AG AGGGG AGC ACAG AG AAAGAG AATG ATGAAGAG AAAGG CCCGG AG ACTG T TGG A 
AGCGAAGTGGAAGCACTGAACCTCCAGGTGACATCTCTGTTTAAGGAGCTTCAAGAGGCTCAT 
ACAAAACTCAGCGAAGCTGAGCTAATGAAGAAGAGACTTCAAGAAAAGTGTCAGGCCCTTGAA 
AGGAAAAATTCTGCAATTCCATCAGAGTTGAATGAAAAGCAAGAGCTTGTTTATACTAACAAA 
AAGTTAGAGCTACAAGTGGAAAGCATGCTATCAGAAATCAAAATGGAACAGGCTAAAACAGAG 
GATGAAAAGTCCAAATTAACTGTGCTACAGATGACACACAACAAGCTTCTTCAAGAACATAAT 
AATGC ATTG AAAACAATTG AGGAACT AAC AAGAAAAG AGTC AGAAAAAGTGG AC AGGG CAG TG 
CTGAAGGAACTGAGTGAAAAACTGGAACTGGCAGAGAAGGCTCTGGCTTCCAAACAGCTGCAA 
ATGGATGAAATGAAGCAAACCATTGCCAAGCAGGAAGAGGACCTGGAAACCATGACCATCCTC 
AGGGCTCAGATGGAAGTTTACTGTTCTGATTTTCATGCTGAAAGAGCAGCGAGAGAGAAAATT 
CATGAGGAAAAGGAGCAACTGGCATTGCAGCTGGCAGTTCTGCTGAAAGAGAATGATGCTTTC 
GAAGACGGAGGCAGGCAGTCCTTGATGGAGATGCAGAGTCGTCATGGGGCGAGAACAAGTGAC 
TCTGACCAGCAGGCTTACCTTGTTCAAAGAGGAGCTGAGGACAGGGACTGGCGGCAACAGCGG 
AATATTCCGATTCATTCCTGCCCCAAGTGTGGAGAGGTTCTGCCTGACATAGACACGTTACAG 
ATTCACGTGATGGATTGCATCATTTAAGTGTTGATGTATCACCTCCCCAAAACTGTTGGTAAA 



TGTCAGATTTTTTCCTCCAAGAGTTGTGCTTTTGTGTTATTTGTTTTCACTCAAATATTTTGC 



CTCATTATTCTTGTTTTAAAAGAAAGAAAACAGGCCGGGCACAGTGGCTCATGCCTGTAATCC 



CAGCACTTTGGGAGATCCAGGTGGGAGGAT 



ORF Start: ATG at 79 



ORF Stop: TAA at 1663 



SEQIDNO: 334 



528 aa 



MW at 60506.5kD 



NOV91c, 

CG97090-03 Protein 
Sequence 



MKELLTENHQLKEAMKLNNQAMKGRFEELSAWTEKQKEERQFFEIQSKEAKERLMALSHENEK 
LKEELGKLKGKSERSSEDPTDDSRLPRAEAEQEKDQLRTQWRLQAEKADLLGIVSELQLKLN 
SSGSSEDSFVEIRMAEGEAEGSVKEIKHSPGPTRTVSTGTSRSADGAKNYFEHEELTVSQLLL 
CLREGNQKVERLEVALKEAKERVSDFEKKTSNRSEIETQTEGSTEKENDEEKGPETVGSEVEA 
LNLQVTSLFKELQEAHTKLSEAELMKKRLQEKCQALERKNSAIPSELNEKQELVYTNKKLELQ 
VESMLSEIKMEQAKTEDEKSKLTVLQMTHNKLLQEHNNALKTIEELTRKESEKVDRAVLKELS 
E KLE L AE KALAS KQL QMDEMKQT I AKQEE DLETMT I LRAQM E VY C S D FHAE RAAR E K I HEE K E 
QLALQLAVLLKENDAFEDGGRQSLMEMQSRHGARTSDSDQQAYLVQRGAEDRDWRQQRNIPIH 
SCPKCGEVLPDIDTLQIHVMDCI I 



SEQIDNO: 335 



1908 bp 



NOV9ld, 
CG97090-02 DNA 
Sequence 



ACAGGTGACTTTT CC AC AGG AAC TTCTGC AATGTCCC AT C AACCT CTC AG ATC CTCCC ACCTG 



GATCTCCCAAACCTGGACACGTTTACCCCGGAGGAGCTGCTGCAGCAGATGAAAGAGCTCCTG 
ACCGAGAACCACCAGCTGAAAGAAGCCATGAAGCTAAATAATCAAGCCATGAAAGGGAGATTT 
G AGG AG CTTT CGG CCTGG ACAGAG AAACAG AAGG AAG AACGCCAGTTTTTTG AGATAC AGAGC 
AAAGAAGCAAAAGAGCGTCTAATGGCCTTGAGTCATGAGAATGAGAAATTGAAGGAAGAGCTT 
GG AAAACT AAAAG GGAAAT CAGAAAGGTC ATCTG AGG AC CCC ACTG ATG ACT C CAGG CTT CCC 
AGGGCCGAAGCGGAGCAGGAAAAGGACCAGCTCAGGACCCAGGTGGTGAGGCTACAAGCAGAG 
AAGGCAGACCTGTTGGGCATCGTGTCTGAACTGCAGCTCAAGCTGAACTCCAGCGGCTCCTCA 
GAAGATTCCTTTGTTGAAATTAGGATGGCTGAAGGAGAAGCAGAAGGGTCAGTAAAAGAAATC 
AAGCATAGTCCTGGGCCCACGAGAACAGTCTCCACTGGCACGGCATTGTCTAAATATAGGAGC 
AG ATCTG CAGATGGGGCC AAG AATTACTTCGAAC ATG AGGAGTTAAC TGTG AG CC AG CTCCTG 
C TGTG CCTAAGGG AAGGG AATC AG AAGGTGGAGAG ACTTG AAGTTGC ACTC AAGG AGG CC AAA 
GAAAGAGTTTCAGATTTTGAAAAGAAAACAAGTAATCGTTCTGAGATTGAAACCCAGACAGAG 
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GGGAGCACAGAGAAAGAGAATGATGAAGAGAAAGGCCCGGAGACTGTTGGAAGCGAAGTGGAA 
GCACTGAACCTCCAGGTGACATCTCTGTTTAAGGAGCTTCAAGAGGCTCATACAAAACTCAGC 
GAAGCTGAGCTAATGAAGAAGAGACTTCAAGAAAAGTGTCAGGCCCTTGAAAGGAAAAATTCT 
GCAATTCCATCAGAGTTGAATGAAAAGCAAGAGCTTGTTTATACTAACAAAAAGTTAGAGCTA 
CAAGTGGAAAGCATGCTATCAGAAATCAAAATGGAACAGGCTAAAACAGAGGATGAAAAGTCC 

AAA TT A A ^TV* T 1 /"'* f* T A F* A T\ T 1 /^ A Z*" 1 A f" 1 A A A /"^ A A f* fPTO'l"!' /"•AA/^AA/^A'PAAT'AA T*r* /"* 1\ rr»rn/*s < • • 

AAA I 1 AAv. 1 tj 1 (jV- 1 Av, AoA 1 LjALALAQ. AACAAvjL 1 I v_ 1 1 V_AAGAACA 1 AA i AA TCjCATTGAAA 

ACaATTGAGGaACTaACaAGaaAAGAGTCAGAAaAAGTGGACAGGGCAGTGCTGaAGGAACTG 
AGTGAAAAACTGGaACTGGCAGAGaAGGCTCTGGCTTCCAAACAGCTGCAAATGGATGAAATG 
AAGCAaACCATTGCCAAGCAGGAAGAGGACCTGGAAACCATGACCATCCTCAGGGCTCAGATG 
GAAGTTTACTGTTCTGATTTTCATGCTGAAAGAGCAGCGAGAGAGAAAATTCATGAGGaaAAG 
GAGCAACTGGCATTGCAGCTGGCAGTTCTGCTGaAAGAGaATGATGCTTTCGAAGACGGAGGC 
AGGCAGTCCTTGATGGAGATGCAGAGTCGTCATGGGGCGAGaACAAGTGACTCTGACCAGCAG 
GCTTACCTTGTTCAAAGAGGAGCTGAGGACAGGGACTGGCGGCaACAGCGGAATATTCCGATT 
CATTCCTGCCCCaAGTGTGGAGAGGTTCTGCCTGACATAGACACGTTACAGATTCACGTGATG 
GATTGCATCATTTAAGTGTTaATGTATCACCTCCCCAAAACTGTTGGTAAATGTCAGATTTTT 


TCCTCCaAGAGTTGTGCTTTTGTGTTATTTGTTTTCACTCAAATATTTTGCCTCATTATTCTT 


GTTTTAAAAGaAAGAAaACAGGCCGGGCACAGTGGCTCATGCCTGTAATCCCAGCACTTTGGG 


agatccaggtgggaggat 




ORF Start: ATG at 3 1 


ORF Stop. TAA at 1714 




SEQIDNO:336 561 aa 


MWat 64354.8kD 


NOV91d, 

CG97090-02 Protein 
Sequence 


MSHQPLRSSHLDLPNLDTFTPEELLQQMKELLTENHQLKEAMKLNNQAMKGRFEELSAWTEKQ 
KEERQFFEIQSKEAKERL^4ALSHENEKLKEELGKLKGKSERSSEDPTDDSRLPRAEAEQEKDQ 
LRTQ WRLQAE KADLLG I VS E LQLKLNS SGS S EDS FVE I RMAEGE AEGS VKEI KHS PGPTRTV 
STGTALSKYRSRSADGAKNYFEHEELTVSQLLLCLREGNQKVERLEVALKEAKERVSDFEKKT 
SNRSEIETQTEGSTEKENDEEKGPETVGSEVEALNLQVTSLFKELQEAHTKLSEAELMKKRLQ 
EKCQALERKNSAIPSEI^EKQELVYTNKKLEI^VESMLSEIKMEQAKTEDEKSKLTVLQMTHN 
KLLQEHNNALKTIEELTRKESEKVDRAVLKELSEKLELAEKALASKQLQMDEMKQTIAKQEED 
LETMTILRAQMEVYCSDFHAERAAREKIHEEKEQLALQLAVLLKENDAFEDGGRQSLMEMQSR 
HGARTSDSDQQAYLVQRGAEDRDWRQQRNIPIHSCPKCGEVLPDIDTLQIHVMDCII 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 9 IB, 



Table 91B. Comparison of NOV91a against NOV91b through NOV91d. 


Protein Sequence 


NOV91a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV9lb 


34..561 
I. .534 


494/534 (92%) 
494/534 (92%) 


NOV9lc 


34..561 
1..528 


494/528 (93%) 
494/528 (93%) 


NOV91d 


I..56I 
1.561 


520/567(91%) 
520/567(91%) 



5 

Further analysis of the NOV91a protein yielded the following properties shown in 



Table 9 1C. 



Table 91C. Protein Sequence Properties NOV91a 


PSort 
analysis: 


0.4500 probability located in cytoplasm; 0.3000 probability located in microbody 
(peroxisome); 0.1000 probability located in mitochondrial matrix space; 0.1000 
probability located in lysosome (lumen) 
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f 

SignalP 


No Known Signal Sequence Predicted 




i analysis: 







A search of the NOV!) I a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 9 ID. 

5 



\ Table 91 D. Geneseq Results for NOV91a 


j 

j Geneseq 
1 Identifier 

i 


Protein/Organism/Length [Patent # t 
Date] 


NOV91a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


iAAY27431 

!. 


Murine RIP-associated protein (RAP- 
2) splice variant (NEMO full) - Mus 
sp, 412 aa. [W09947672-A1, 23-SEP- 
1999] 


232..560 
90. .4 12 


100/341 (29%) 
184/341 (53%) 


3e-33 


| AAY27430 

1 

i 


Human RIP-associated protein (RAP- 
2) - Homo sapiens, 416 aa. 
[W09947672-AI, 23-SEP-1999] 


229..558 
88..416 


101/337(29%) 
184/337(53%) 


6e-32 


jAAU84350 

I 
i 
i 


Protein MYH1 1 differentially 
expressed in breast cancer tissue - 
Homo sapiens, 1 857 aa. 
[WO200210436-A2, 07-FEB-2002] 


29..502 
1095.. 1596 


117/520 (22%) 
222/520 (42%) 


le-15 


i ABG06505 

f 

i 

1 


Novel human diagnostic protein #6496 
- Homo sapiens, 2633 aa. 
[WO200175067-A2, ll-OCT-2001] 


29..487 
1054.. 1546 


117/511 (22%) 
230/511 (44%) 


le-15 


t~~ 

; AAM41000 

i 


Human polypeptide SEQ ID NO 593 1 

- Homo sapiens, 1988 aa. 

[ WO200 1 533 1 2-A 1 , 26-JUL-200 1 ] 


22..502 
983.. 1526 


126/563 (22%) 
236/563 (41%) 


2e-l5 



In a BLAST search of public sequence datbases, the NOV9la protein was found to 
have homology to the proteins shown in the BLASTP data in Table 91 E. 



r 



Table 91 E. Public BLASTP Results for NOV91a 



j 

1 

Protein 
Accession 
! Number 


Protein/Organism/Length 


NOV91a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


AAH32762 


Similar to optineurin - Homo sapiens 
(Human), 571 aa. 


I..561 
1..571 


555/571 (97%) 
556/571 (97%) 


0.0 
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Q96CV9 


Tumor necrosis factor alpha-inducible 
cellular protein containing leucine zipper 
domains, huntingtin interacting protein L, 
transcrption factor IIIA-interacting protein 
(Optineurin isoform 1) (Optineurin 
isoform 2) (Optineurin isoform 3) - Homo 
sapiens (Human), 577 aa. 


1..561 
1..577 


555/577 (96%) 
556/577 (96%) 


0.0 


Q9Y218 


FIP2 - Homo sapiens (Human), 577 aa. 


I..561 
1..577 


552/577 (95%) 
554/577 (95%) 


0.0 


Q9BGR3 


Hypothetical 65.1 kDa protein - Macaca 
fascicularis (Crab eating macaque) 
(Cynomolgus monkey), 571 aa. 


1..56I 
1..571 


538/571 (94%) 
547/571 (95%) 


0.0 


Q95KA2 


Hypothetical 62.9 kDa protein - Macaca 
fascicularis (Crab eating macaque) 
(Cynomolgus monkey), 550 aa. 


16..561 
5..550 


526/546 (96%) 
534/546 (97%) 


0.0 



PFam analysis predicts that the NOV91a protein contains the domains shown in the 
Table 91F. 



Table 91F. Domain Analysis of NOV91a 


Pfam Domain 


NOV91a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


zf-C2H2 


S37..559 


6/24 (25%) 
17/24(71%) 


0.51 



5 



Example 92. 



The NOV92 clone was analyzed, and the nucleotide and encoded polypeptide 



sequences are shown in Table 92A. 
Tabic 92A. NOV92 Sequence Analysis 




SEQIDNO:337 750 bp 


NOV92a, 
CG97966-01 DNA 
Sequence 


GTGGCTGCTCGGGACCACCCGAACCCGCGGCCATGGCCCCGGCCGCCGCCAGCCCCCCGGAGG 


TG ATC CG CGCGG CG C AG AAGG ACG AGT ACT ACCGCGG TG GGC TG CGG AGCG CG GCGGGCGG CG 
CCCTGCACAGCCTGGCGGGTGCGGGGAAGTGGCTGGAGTGGAGGAAGGAGGTTGAGCTGCTCT 
CAGATGTGGCCTACTTTGGCCTCACCACACTTGCAGGCTACCAGACCCTGGGGGAGGAGTACG 
TCAGCATCATCCAGGTGGACCCATCGCGGATACATGTGCCCTCCTCGCTGCGCCGTGGCGTGC 
TGGTG ACG CTG C ATG CCGTCC TGC CCT ACC TGCTGG ACAAGGCCC TG CTCCCCCTGG AG C AGG 
AGCTGCAGGCTGACCCCGACAGTGGGCGACCCTTGCAGGGGAGCCTGGGGCCAGGTGGGCGTG 
GCTGCTCAGGGGCGCGGCGCTGGATGCGTCACCACACGGCCACCCTGACTGAGCAGCAGAGGA 
GGGCG CTGCTGCGGG CGGTCT TCGTCCTC AG AC AGGG CCTCGCCTG C CTC C AGCGGCT AC ATG 
TTGCCTGGTTTTACATCCACCTGTTCTGCTGGGAGTGCATCACCGCGTGGTGCAGCAGCAAGG 
CGGAGTGTCCCCTCTGCCGGGAGAAGTTCCCTCCCCAGAAGCTCATCTACCTTCGGCACTACC 
GCTGAGCCGGCGCCCGGGTGGGCCTGGACACAGATGACCTCTACGGGAGTCTGAACG 




ORF Start: ATG at 33 ] jORF Stop: TGA at 696 
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i 


SEQIDNO:338 


221 aa jlVIW at 24759.4kD 


NOV92a, 

CG97966-0I Protein 
Sequence 


MAPAAASPPEVIRAAQKDEYYRGGLRSAAGGALHSLAGAGKWLEWRKEVELLSDVAYFGLTTL 
AGYQTLGEEYVSIIQVDPSRIHVPSSLRRGVLVTLHAVLPYLLDKALLPLEQELQADPDSGRP 
LQG SLG PGG RGC S G AR R WM RH HT AT LTEQQ R RALLRA VF VLRQG L AC LQR LHV AW F Y I HL FC W 
ECITAWCSSKAECPLCREKFPPQKLIYLRHYR 




SEQIDNO:339 


489 bp 




NOV92b, 
CG97966-03 DNA 
Sequence 


GTGGCTGCTCGGGACCACCCGAACCCGCGGCCATGGCCCCGGCCGCCGCCAGCCCCCCGGAGG 


TGATCCGCGCGGCGCAGAAGGACGAGTACTACCGCGGTGGGCTGCGGAGCGCGGCGGGCGGCG 
CCCTGCACAGCCTGGCGGGTGCGAGGAAGTGGCTGGAGTGGAGGAAGGAGGTTGAGCTGCTCT 
CAGATGTGGCCTACTTTGGCCTCACCACACTTGCAGGCTACCAGACCCTGGGGGAGGAGAGAG 
CCGTTTCCAGAAACCCCCTGTGCACCCTGTGCCTGGAGGAGCGCAGGCACCCAACAGCCACGC 
CCTGCGGCCACCTGTTCTGCTGGGAGTGCATCACCGCGTGGTGCAGCAGCAAGGCGGAGTGTC 
CCCTCTGCCGGGAGAAGTTCCCTCCCCAGAAGCTCATCTACCTTCGGCACTACCGCTGAGCCG 
GCGCCCGGGTGGGCCTGGACACAGATGACCTCTACGGGAGTCTGAACG 




ORF Start: ATG at 33 






ORF Stop: TGA at 435 




SEQ ID NO: 340 


134aa jMWat 15069.1kD 


NOV92b, 

CG97966-03 Protein 
Sequence 


MAPAAASPPEVIRAAQKDEYYRGGLRSAAGGALHSLAGARKWLEWRKEVELLSDVAYFGLTTL 
AGYQTLGEERAVSRNPLCTLCLEERRHPTATPCGHLFCWECITAWCSSKAECPLCREKFPPQK 
LIYLRHYR 




SEQ ID NO: 341 


1267 bp 




NOV92c, 
CG97966-02 DNA 
Sequence 

1 


GTGGCTGCTCGGGACCACCCGAACCCGCGGCCATGGCCCCGGCCGCCGCCAGCCCCCCGGAGG 


TGATCCGCGCGGCGCAGAAGGACGAGTACTACCGCGGTGGGCTGCGGAGCGCGGCGGGCGGCG 
CCCTGCACAGCCTGGCGGGTGCGAGGAAGTGGCTGGAGTGGAGGAAGGAGGTTGAGCTGCTCT 
CAGATGTGGCCTACTTTGGCCTCACCACACTTGCAGGCTACCAGACCCTGGGGGAGGAGTACG 
TCAGCATCATCCAGGTGGACCCATCGCGGATACATGTGCCCTCCTCGCTGCGCCGTGGCGTGC 
TGGTGACGCTGCATGCCGTCCTGCCCTACCTGCTGGACAAGGCCCTGCTCCCCCTGGAGCAGG 
AGCTGCAGGCTGACCCCGACAGTGGGCGACCCTTGCAGGGGAGCCTGGGGCCAGGTGGGCGTG 
GCTGCTCAGGGGCGCGGCGCTGGATGCGTCACCACACGGCCACCCTGACTGAGCAGCAGAGGA 
GGGCGCTGCTGCGGGCGGTCTTCGTCCTCAGACAGGGCCTCGCCTGCCTCCAGCGGCTACATG 
TTGCCTGGTTTTACATCCACGGTGTCTTCTACCACCTGGCCAAGAGGCTCACGGGGATCACGT 
ACCTCCGTGTCCGCAGCCTGCCCGGAGAGGACCTGAGGGCCCGTGTTAGCTACAGGCTGCTGG 
GGGTCATCTCACTGCTGCACCTGGTGCTGTCCATGGGGCTGCAGCTGTACGGTTTCAGGCAGC 
GGCAGCGAGCCAGGAAGGAGTGGAGGCTGCACCGCGGCCTGTCTCACCGCAGGGCCTCCTTGG 
AGGAGAGAGCCGTTTCCAGAAACCCCCTGTGCACCCTGTGCCTGGAGGAGCGCAGGCACCCAA 
C AGC C ACGCC CTG CGG CC ACCTGTT CTGC TGGG AGTGC ATC ACCG CGTGGTG CAG C AG C AAGG 
CGGAGTGTCCCCTCTGCCGGGAGAAGTTCCCTCCCCAGAAGCTCATCTACCTTCGGCACTACC 
GCTGAGCCGGCGCCCGGGTGGGCCTGGACACAGATGACCTCTACGGGAGTCTAAACGCCAAGA 




TTTAGTCTCAGGATTAACCTTGCTTGCACAGAAGTTAGAACACTCTCAGTTTTTTGTCATGTA 




AGATACTAACCTAGCCACCCTGGGAGAGAACAGAAAGCTGTCCCTGGCTGCGCTTTCTCAGCC 




CTGGGAGGGGCGCCTGAACCCAGAACATTTCCCTAACCCCAACCTGGTAGGACTCAGCCACTT 




CTTCAGG 




ORF Start: ATG at 33 


]ORF Stop: TGA at 1011 




SEQ ID NO: 342 ]326 


aa ]MWat37068.6kD 


NOV92c, 

CG97966-02 Protein 
Sequence 


MAPAAASPPEVIRAAQKDEYYRGGLRSAAGGALHSLAGARKWLEWRKEVELLSDVAYFGLTTL 
AGYQTLGEEYVSI IQVDPSRI HVPSSLRRGVLVTLHAVLP YLLDKALLPLEQELQADPDSGRP 
LQGSLGPGGRGCSGARRWMRHHTATLTEQQRRALLRAVFVLRQGLACLQRLHVAWFYIHGVFY 
HLAKRLTGITYLRVRSLPGEDLRARVSYRLLGVISLLHLVLSMGLQLYGFRQRQRARKEWRLH 
RGLSHRRASLEERAVSRNPLCTLCLEERRHPTATPCGHLFCWECITAWCSSKAECPLCREKFP 
PQKLIYLRHYR 


i 


SEQ ID NO: 343 


]2059 bp j 




jNOV92d, 


GGCACGAGGCTGCTCGGGACCACCCGAACCCGCGGCCATGGCCCCGGCCGCCGCCAGCCCCCC 


;CG97966-04 DNA 
Sequence 


GGAGG TGATCCGCGCGG CGCAG AAGG ACGAGT ACT ACCGCGGTGGGCTGCGGAGCGCGGCGGG 
CGGCGCCCTGCACAGCCTGGCGGGTGCGAGGAAGTGGCTGGAGTGGAGGAAGGAGGTTGAGCT 
GCTCTCAGATGTGGCCTACTTTGGCCTCACCACACTTGCAGGCTACCAGACCCTGGGGGAGGA 
GTACGTCAGCATCATCCAGGTGGACCCATCGCGGATACATGTGCCCTCCTCGCTGCGCCGTGG 
CGTGCTGGTGACGCTGCATGCCGTCCTGCCCTACCTGCTGGACAAGGCCCTGCTCCCCCTGGA 
GCAGGAGCTGCAGGCTGACCCCGACAGTGGGCGACCCTTGCAGGGGAGCCTGGGGCCAGGTGG 
GCGTGGCTGCTCAGGGGCGCGGCGCTGGATGCGTCACCACACGGCCACCCTGACTGAGCAGCA 
GAGGAGGGCGCTGCTGCGGGCGGTCTTCGTCCTCAGACAGGGCCTCGCCTGCCTCCAGCGGCT 
ACATGTTGCCTGGTTTTACATCCACGGTGTCTTCTACCACCTGGCCAAGAGGCTCACGGGGAT 
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1 

! 
i 

i 
1 

I 

! 

i 


CACGTACGCGCTGAGGCCAGATCCCCTCAGGGTCCTGATGAGTGTGGCGCCATCTGCCTTACA 
GCTCCGTGTCCGCAGCCTGCCCGGAGAGGACCTGAGGGCCCGTGTTAGCTACAGGCTGCTGGG 
GGTCATCTCACTGCTGCACCTGGTGCTGTCCATGGGGCTGCAGCTGTACGGTTTCAGGCAGCG 
GCAGCGAGCCAGGAAGGAGTGGAGGCTGCACCGCGGCCTGTCTCACCGCAGGGCCTCCTTGGA 
GGAGAGAGCCGTTTCCAGAAACCCCCTGTGCACCCTGTGCCTGGAGGAGCGCAGGCACCCAAC 
AGCCACGCCCTGCGGCCACCTGTTCTGCTGGGAGTGCATCACCGCGTGGTGCAGCAGCAAGGC 
GGAGTGTCCCCTCTGCCGGGAGAAGTTCCCTCCCCAGAAGCTCATCTACCTTCGGCACTACCG 
CTGAGCCGGCGCCCGGGTGGGCCTGGACACAGATGACCTCTACGGGAGTCTGAACGCCAAGAT 


TTAGTCTCAGGATTAACCTTGCTTGCACAGAAGTTAGAACACTCTCAGTTTTTTGTCATGTAA 


GATACTAACCTAGCCACCCTGGGAGAGAACAGAAAGCTGTCCCTGGCTGCACTTTCTCAGCCC 


TGGGAGGGGCGCCTG AAC CC AG AACATTTCC CTAACCCC AACC TGGT AGG ACT C AGC C ACTTC 


TTCAGGAATTTCACTTATTTGGACGGGATTTTAGGTTTCCCTCCCTTCCCCAAACCATACAGT 


TGAGAAGTAATTCAGAAGTAGGCCAGAAGACACTTTATTCGTTTATATTGTGAGAAAACAGCC 


CCATCAGGCTTGTGTTAAGGCAATGGACTGAATGAGTGCGTGCTGGGTGGGGTGGGGCACGGA 


GGCTGGCGGGTTGCTTCAGCCAGTGCAGTGAGAACAGCAGCCCCACGGCCCCATGGGAGGCGG 


CG CTGCTCTCCC CG AGGG CGGCTGGG CAG AG C ACATCCCCC AGG ACTTG ATG ACC AC ACGGGG 


CAGAGAGAAACCAACCAAGGCCAGCACCTCCGTCGGAAGCATTTGGCACACACACCTTCAATA 


CACGTCAAGGTCGCTTCCAGTTTTAGAAAACAGAAATCTGCATCTCAGCCTGAGACGCACAGA 


GAGGTCTCTTCCTGACCCAGACGCACTCACGAGCCAGGTCCTGGGGGTATGGGGGCTGCCAGG 


GGCGCCCGAGCCCTCTCCTGGGGGGCCTGCTGGGCAGGCGACCTGCTGACCCACGGTCACTGC 


TGTGTTCAGCCCCTCAGCTCGGCCCCAGCCTATTTCCCGCCTCCATTTGATGTTTCCAGGTTT 


TCAAAACTGCATTTAACCTGCGCCAGAGAGTTCACCGTAGGCATCTTTAATAAACTAACTCCA 


GCAAAATGTGGGTACGTTACTAAAAAAAAAAAAAAAAAAAAAA 




ORF Start: ATG at 38 i OR F Stop: TGA at 1 073 




SEQ ID NO: 344 345 aa M W at 39085.0kD 


MOV92d, 

CG97966-04 Protein 
Sequence 


MAPAAASPPEVIRAAQKDEYYRGGLRSAAGGALHSLAGARKWLEWRKEVELLSDVAYFGLTTL 
AGYQTLGEEYVSI IQVDPSRIHVPSSLRRGVLVTLHAVLPYLLDKALLPLEQELQADPDSGRP 
LQGSLGPGGRGCSGARRWMRHHTATLTEQQRRALLRAVFVLRQGLACLQRLHVAWFYIHGVFY 
HLAKRLTGITYALRPDPLRVLMSVAPSALQLRVRSLPGEDLRARVSYRLLGVISLLHLVLSMG 
LQLYGFRQRQRARKEWRLHRGLSHRRASLEERAVSRNPLCTLCLEERRHPTATPCGHLFCWEC 
ITAWCSSKAECPLCREKFPPQKLIYLRHYR 



Sequence comparison of the above protein sequences yields the following sequence 
relationships shown in Table 92B. 



Table 92B. Comparison of NOV92a against NOV92b through NOV92d. 


Protein Sequence 


NOV92a Residues/ 
Match Residues 


Identities/ 

Similarities for the Matched Region 


NOV92b 


1..72 
I..72 


53/72 (73%) 
53/72 (73%) 


NOV92c 


1..185 
1 -.185 


125/185 (67%) 
125/185 (67%) 


NOV92d 


I..185 
I..185 


125/185 (67%) 
125/185 (67%) 



5 



Further analysis of the NOV92a protein yielded the following properties shown in 
Table 92C. 
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Table 92C Protein Sequence Properties NOV92a 


;PSort 
, analysis: 


0.4500 probability located in cytoplasm; 0.3774 probability located in microbody 
(peroxisome); 0.2542 probability located in lysosome (lumen); 0.1000 probability 
located in mitochondrial matrix space 


r SignalP 
analysis: 


No Known Signal Sequence Predicted , 



A search of the NOV92a protein against the Geneseq database, a proprietary database 
that contains sequences published in patents and patent publication, yielded several 
homologous proteins shown in Table 92D. 

5 



j Table 92D. Geneseq Results for NOV92a 








i 

j Geneseq 
j Identifier 


Protein/Organism/Length [Patent #, 
Date] 

i 


iNUVVia 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Region 


Expect 
Value 


AAB51471 

1 


Human secreted protein BLAST search 
protein SEQ ID NO: 148 - Homo 
sapiens, 55 aa. [WO200058495-A1, 05- 
OCT-2000] 


185..221 
19..55 


37/37(100%) 
37/37(100%) 


3e-18 


1 AAB51470 

i 

i 

! 


Human secreted protein BLAST search 
protein SEQ ID NO: 147 - Homo 
sapiens, 55 aa. [ WO200058495-A 1 , 05- 
OCT-2000] 


185..221 
I9..55 


37/37(100%) 
37/37(100%) 


3e-18 


"•■ AAB5I469 

1 

\ 

i 


Human secreted protein BLAST search 
prolein SEQ ID NO: 146 - Homo 
sapiens, 55 aa. [WO200058495-A1, 05- 
OCT-2000] 


I85..22I 
I9..55 


37/37(100%) 
37/37(100%) 


3e-18 


1AAB5I468 

1 

i 

| ; 


Human secreted protein BLAST search 
protein SEQ ID NO: 145 - Homo 
sapiens, 55 aa. [WO200058495-AI, 05- 
OCT-2000] 


185..221 
19..55 


37/37(100%) 
37/37(100%) 


3e-18 


j AAU93078 

j 

i 

! 


Arabidopsis transcription factor #1 16 - 

Arabidopsis thaliana, 381 aa. 

[ WO2002 1 5675-A 1 , 28-FEB-2002] 


6..108 
29.. 131 


39/103 (37%) 
59/103 (56%) 


6e-13 



In a BLAST search of public sequence datbases, the NOV92a protein was found to 
have homology to the proteins shown in the BLASTP data in Table 92E. 
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Table 92E. Public BLASTP Results for NOV92a 



; Protein 
J Accession 
I Number 


Protein/Organism/Length 


NOV92a 
Residues/ 
Match 
Residues 


Identities/ 
Similarities for 
the Matched 
Portion 


Expect 
Value 


^ 060683 
i 

! 


Peroxisome assembly protein 10 
(Peroxin-10) - Homo sapiens (Human), 
326 aa. 


1..I85 
1..I85 


184/185 (99%) 
184/185 (99%) 


e-103 


IAAM64667 

i 
j 
i 


Putative peroxisome assembly protein 
PER8 - Arabidopsis thaiiana (Mouse- 
ear cress), 381 aa. 


6.. 108 
29.. 131 


39/103 (37%) 
59/103 (56%) 


2e-12 


|Q9SYU4 

i 

j 

i 
i 


Zinc-binding peroxisomal integral 
membrane protein (Putative peroxisome 
assembly protein PER8) - Arabidopsis 
thaiiana (Mouse-ear cress), 381 aa. 


6.. 108 
29.. 131 


39/103 (37%) 
59/103 (56%) 


2e«12 


Q9M400 


PexlOp - Arabidopsis thaiiana (Mouse- 
ear cress), 381 aa. 


6..I08 
29.. 131 


39/103 (37%) 
59/103 (56%) 


2e-12 


Q94LL6 


Putative zinc-binding peroxisomal 
integral membrane protein - Oryza 
sativa (Rice), 382 aa. 


6.. 108 
31..133 


38/103 (36%) 
57/103 (54%) 


4e-ll 



PFam analysis predicts that the NOV92a protein contains the domains shown in the 
Table 92F. 



j Table 92F. Domain Analysis of NOV92a 


r 

Pfam Domain 


NOV92a Match Region 


Identities/ 
Similarities 

for the Matched Region 


Expect Value 


zf-C3HC4 


I85..205 


9/29 (31%) 
16/29(55%) 


0.011 



5 



Example B: Sequencing Methodology and Identification of NOVX Clones 

1. GeneCalling ' Technology: This is a proprietary method of performing 
differential gene expression profiling between two or more samples developed at CuraGen 
10 and described by Shimkets, et al., "Gene expression analysis by transcript profiling coupled 
to a gene database query" Nature Biotechnology 1 7: 1 98-803 ( 1 999). cDN A was derived from 
various human samples representing multiple tissue types, normal and diseased states, 
physiological states, and developmental states from different donors. Samples were obtained 
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as whole tissue, primary cells or tissue cultured primary cells or cell lines. Cells and cell lines 
may have been treated with biological or chemical agents that regulate gene expression, for 
example, growth factors, chemokines or steroids. The cDNA thus derived was then digested 
with up to as many as 120 pairs of restriction enzymes and pairs of linker-adaptors specific 
5 for each pair of restriction enzymes were ligated to the appropriate end. The restriction 

digestion generates a mixture of unique cDNA gene fragments. Limited PCR amplification is 
performed with primers homologous to the linker adapter sequence where one primer is 
biotinylated and the other is fluorescently labeled. The doubly labeled material is isolated 
and the fluorescently labeled single strand is resolved by capillary gel electrophoresis. A 

10 computer algorithm compares the electropherograms from an experimental and control group 
for each of the restriction digestions. This and additional sequence-derived information is 
used to predict the identity of each differentially expressed gene fragment using a variety of 
genetic databases. The identity of the gene fragment is confirmed by additional, 
gene-specific competitive PCR or by isolation and sequencing of the gene fragment. 

15 2. SeqCalling™ Technology: cDNA was derived from various human samples 

representing multiple tissue types, normal and diseased states, physiological states, and 
developmental states from different donors. Samples were obtained as whole tissue, primary 
cells or tissue cultured primary cells or cell lines. Cells and ceil lines may have been treated 
with biological or chemical agents that regulate gene expression, for example, growth factors, 

20 chemokines or steroids. The cDNA thus derived was then sequenced using CuraGen's 

proprietary SeqCalling technology. Sequence traces were evaluated manually and edited for 
corrections if appropriate. cDNA sequences from all samples were assembled together, 
sometimes including public human sequences, using bioinformatic programs to produce a 
consensus sequence for each assembly. Each assembly is included in CuraGen Corporation's 

25 database. Sequences were included as components for assembly when the extent of identity 
with another component was at least 95% over 50 bp. Each assembly represents a gene or 
portion thereof and includes information on variants, such as splice forms single nucleotide 
polymorphisms (SNPs), insertions, deletions and other sequence variations. 



30 by laboratory screening of cDNA library by the two-hybrid approach. cDNA fragments 
covering either the full length of the DNA sequence, or part of the sequence, or both, are 
sequenced. In silico prediction was based on sequences available in CuraGen Corporation's 



3. 



PathCalling Technology: The NOVX nucleic acid sequences are derived 
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proprietary sequence databases or in the public human sequence databases, and provided 
either the full length DNA sequence, or some portion thereof. 

The laboratory screening was performed using the methods summarized below: 
cDNA libraries were derived from various human samples representing multiple 
5 tissue types, normal and diseased states, physiological states, and developmental states from 
different donors. Samples were obtained as whole tissue, primary cells or tissue cultured 
primary cells or cell lines. Cells and cell lines may have been treated with biological or 
chemical agents that regulate gene expression, for example, growth factors, chemokines or 
steroids. The cDNA thus derived was then directionally cloned into the appropriate 

10 two-hybrid vector (Gal4-activation domain (Gal4-AD) fusion). Such cDNA libraries as well 
as commercially available cDNA libraries from Clontech (Palo Alto, CA) were then 
transferred from Exoli into a CuraGen Corporation proprietary yeast strain (disclosed in U. 
S. Patents 6,057,101 and 6,083,693, incorporated herein by reference in their entireties). 

Gal4-binding domain (Gal4-BD) fusions of a CuraGen Corportion proprietary Vibrary 

1 5 of human sequences was used to screen multiple GaI4-AD fusion cDNA libraries resulting in 
the selection of yeast hybrid diploids in each of which the GaI4-AD fusion contains an 
individual cDNA. Each sample was amplified using the polymerase chain reaction (PCR) 
using non-specific primers at the cDNA insert boundaries. Such PCR product was sequenced; 
sequence traces were evaluated manually and edited for corrections if appropriate. cDNA 

20 sequences from all samples were assembled together, sometimes including public human 

sequences, using bioinformatic programs to produce a consensus sequence for each assembly. 
Each assembly is included in CuraGen Corporation's database. Sequences were included as 
components for assembly when the extent of identity with another component was at least 
95% over 50 bp. Each assembly represents a gene or portion thereof and includes information 

25 on variants, such as splice forms single nucleotide polymorphisms (SNPs), insertions, 
deletions and other sequence variations. 

Physical clone: the cDNA fragment derived by the screening procedure, covering the 
entire open reading frame is, as a recombinant DNA, cloned into pACT2 plasm id (Clontech) 
. used to make the cDNA library. The recombinant plasmid is inserted into the host and 

30 selected by the yeast hybrid diploid generated during the screening procedure by the mating 
of both CuraGen Corporation proprietary yeast strains N 106' and YULH (U. S. Patents 
6,057,101 and 6,083,693). 
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4. RACE: Techniques based on the polymerase chain reaction such as rapid 
amplification of cDNA ends (RACE), were used to isolate or complete the predicted 
sequence of the cDNA of the invention. Usually multiple clones were sequenced from one or 
more human samples to derive the sequences for fragments. Various human tissue samples 

5 from different donors were used for the RACE reaction. The sequences derived from these 
procedures were included in the SeqCalling Assembly process described in preceding 
paragraphs. 

5. Exon Linking: The NOVX target sequences identified in the present 
invention were subjected to the exon linking process to confirm the sequence. PCR primers 

1 0 were designed by starting at the most upstream sequence available, for the forward primer, 
and at the most downstream sequence available for the reverse primer. In each case, the 
sequence was examined, walking inward from the respective termini toward the coding 
sequence, until a suitable sequence that is either unique or highly selective was encountered, 
or, in the case of the reverse primer, until the stop codon was reached. Such primers were 

15 designed based on in silico predictions for the full length cDNA, part (one or more exons) of 
the DNA or protein sequence of the target sequence, or by translated homology of the 
predicted exons to closely related human sequences from other species. These primers were 
then employed in PCR amplification based on the following pool of human cDNAs: adrenal 
gland, bone marrow, brain - amygdala, brain - cerebellum, brain - hippocampus, brain - 

20 substantia nigra, brain - thalamus, brain -whole, fetal brain, fetal kidney, fetal liver, fetal 
lung, heart, kidney, lymphoma - Raji, mammary gland, pancreas, pituitary gland, placenta, 
prostate, salivary gland, skeletal muscle, small intestine, spinal cord, spleen, stomach, testis, 
thyroid, trachea, uterus. Usually the resulting amplicons were gel purified, cloned and 
sequenced to high redundancy. The PCR product derived from exon linking was cloned into 

25 the pCR2.1 vector from Invitrogen. The resulting bacterial clone has an insert covering the 
entire open reading frame cloned into the pCR2.l vector. The resulting sequences from all 
clones were assembled with themselves, with other fragments in CuraGen Corporation's 
database and with public ESTs. Fragments and ESTs were included as components for an 
assembly when the extent of their identity with another component of the assembly was at 

30 least 95% over 50 bp. In addition, sequence traces were evaluated manually and edited for 
corrections if appropriate. These procedures provide the sequence reported herein. 

6. Physical Clone: Exons were predicted by homology and the intron/exon 
boundaries were determined using standard genetic rules. Exons were further selected and 
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refined by means of similarity determination using multiple BLAST (for example, tBIastN, 
BlastX, and BlastN) searches, and, in some instances, GeneScan and Grail. Expressed 
sequences from both public and proprietary databases were also added when available to 
further define and complete the gene sequence. The DNA sequence was then manually 
5 corrected for apparent inconsistencies thereby obtaining the sequences encoding the 
full-length protein. 

The PCR product derived by exon linking, covering the entire open reading frame, 
was cloned into the pCR2.1 vector from Invitrogen to provide clones used for expression and 
screening purposes. 

10 Example C: Quantitative expression analysis of clones in various cells and tissues 

The quantitative expression of various clones was assessed using microtiter plates 
containing RNA samples from a variety of normal and pathology-derived cells, cell lines and 
tissues using real time quantitative PCR (RTQ PCR). RTQ PCR was performed on an 
Applied Biosystems AB1 PRISM® 7700 or an AB! PRISM® 7900 HT Sequence Detection 

1 5 System. Various collections of samples are assembled on the plates, and referred to as Panel 
1 (containing normal tissues and cancer cell lines), Panel 2 (containing samples derived from 
tissues from normal and cancer sources), Panel 3 (containing cancer cell lines), Panel 4 
(containing cells and cell lines from normal tissues and cells related to inflammatory 
conditions), Panel 5D/5I (containing human tissues and cell lines with an emphasis on 

20 metabolic diseases), AI_comprehensive_paneI (containing normal tissue and samples from 
autoimmune/autoinflammatory diseases), Panel CNSD.0I (containing samples from normal 
and diseased brains) and CNS_neurodegeneration_paneI (containing samples from normal 
and Alzheimer's diseased brains). 

RNA integrity from all samples is controlled for quality by visual assessment of 

25 agarose gel electropherograms using 28S and 1 8S ribosomal RNA staining intensity ratio as a 
guide (2:1 to 2.5:1 28s: 18s) and the absence of low molecular weight RNAs that would be 
indicative of degradation products. Samples are controlled against genomic DNA 
contamination by RTQ PCR reactions run in the absence of reverse transcriptase using probe 
and primer sets designed to amplify across the span of a single exon. 

30 First, the RNA samples were normalized to reference nucleic acids such as 

constitutively expressed genes (for example, P-actin and GAPDH). Normalized RNA (5 ul) 
was converted to cDNA and analyzed by RTQ-PCR using One Step RT-PCR Master Mix 



446 



WO 03/023002 



T/US02/28S39 



Reagents (Applied Biosystems; Catalog No. 4309169) and gene-specific primers according to 
the manufacturer's instructions. 

In other cases, non-normalized RNA samples were converted to single strand cDNA 
(sscDNA) using Superscript II (Invitrogen Corporation; Catalog No. 18064-147) and random 
5 hexamers according to the manufacturer's instructions. Reactions containing up to 10 \ig of 
total RNA were performed in a volume of 20 ^il and incubated for 60 minutes at 42 °C. This 
reaction can be scaled up to 50 |ig of total RNA in a final volume of 100 sscDNA samples 
are then normalized to reference nucleic acids as described previously, using IX TaqMan® 
Universal Master mix (Applied Biosystems; catalog No. 4324020), following the 

1 0 manufacturer's instructions. 

Probes and primers were designed for each assay according to Applied Biosystems 
Primer Express Software package (version I for Apple Computer's Macintosh Power PC) or a 
similar algorithm using the target sequence as input. Default settings were used for reaction 
conditions and the following parameters were set before selecting primers: primer 

1 5 concentration = 250 nM, primer melting temperature (Tm) range = 58 °-60 °C, primer optimal 
Tm - 59 °C, maximum primer difference = 2 °C, probe does not have 5 f G, probe Tm must be 
10 °C greater than primer Tm, amplicon size 75bp to lOObp. The probes and primers selected 
(see below) were synthesized by Synthegen (Houston, TX, USA). Probes were double 
purified by HPLC to remove uncoupled dye and evaluated by mass spectroscopy to verify 

20 coupling of reporter and quencher dyes to the 5' and 3' ends of the probe, respectively. Their 
final concentrations were: forward and reverse primers, 900nM each, and probe, 200nM. 

PCR conditions: When working with RNA samples, normalized RNA from each 
tissue and each cell line was spotted in each well of either a 96 well or a 384-well PCR plate 
(Applied Biosystems). PCR cocktails included either a single gene specific probe and primers 

25 set, or two multiplexed probe and primers sets (a set specific for the target clone and another 
gene-specific set multiplexed with the target probe). PCR reactions were set up using 
TaqMan® One-Step RT-PCR Master Mix (Applied Biosystems, Catalog No. 43 1 3803) 
following manufacturer's instructions. Reverse transcription was performed at 48°C for 30 
minutes followed by amplification/PCR cycles as follows: 95°C 10 min, then 40 cycles of 95 

30 °C for 15 seconds, 60 °C for I minute. Results were recorded as CT values (cycle at which a 
given sample crosses a threshold level of fluorescence) using a log scale, with the difference 
in RNA concentration between a given sample and the sample with the lowest CT value 
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10 



15 



20 



25 



being represented as 2 to the power of delta CT. The percent relative expression is then 
obtained by taking the reciprocal of this RNA difference and multiplying by 100. 

When working with sscDNA samples, normalized sscDNA was used as described 
previously for RNA samples. PCR reactions containing one or two sets of probe and primers 
were set up as described previously, using 1 X TaqMan® Universal Master mix (Applied 
Biosystems; catalog No. 4324020), following the manufacturer's instructions. PCR 
amplification was performed as follows: 95 °C 10 min, then 40 cycles of 95 °C for 1 5 
seconds, 60 °C for 1 minute. Results were analyzed and processed as described previously. 

Panels 1, 1.1, 1.2, and 1.3D 

The plates for Panels 1, 1.1, 1.2 and 1. 3D include 2 control wells (genomic DNA 
control and chemistry control) and 94 wells containing cDNA from various samples. The 
samples in these panels are broken into 2 classes: samples derived from cultured cell lines 
and samples derived from primary normal tissues. The cell lines are derived from cancers of 
the following types: lung cancer, breast cancer, melanoma, colon cancer, prostate cancer, 
CNS cancer, squamous cell carcinoma, ovarian cancer, liver cancer, renal cancer, gastric 
cancer and pancreatic cancer. Cell lines used in these panels are widely available through the 
American Type Culture Collection (ATCC), a repository for cultured cell lines, and were 
cultured using the conditions recommended by the ATCC. The normal tissues found on these 
panels are comprised of samples derived from all major organ systems from single adult 
individuals or fetuses. These samples are derived from the following organs: adult skeletal 
muscle, fetal skeletal muscle, adult heart, fetal heart, adult kidney, fetal kidney, adult liver, 
fetal liver, adult lung, fetal lung, various regions of the brain, the spleen, bone marrow, lymph 
node, pancreas, salivary gland, pituitary gland, adrenal gland, spinal cord, thymus, stomach, 
small intestine, colon, bladder, trachea, breast, ovary, uterus, placenta, prostate, testis and 
adipose. 

In the results for Panels 1, 1.1, 1.2 and 1.3D, the following abbreviations are used: 



ca. = carcinoma, 

* = established from metastasis, 

met = metastasis, 

s cell var = small cell variant, 

non-s = non-sm = non-small, 

squam = squamous, 
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pi. eff = pi effusion = pleural effusion, 

glio = glioma, 

astro = astrocytoma, and 

neuro = neuroblastoma. 

5 General_screening_panel_vl.4, vl.5 and vl.6 

The plates for Panels 1.4, vl .5 and vl .6 include two control wells (genomic DNA 
control and chemistry control) and 94 wells containing cDNA from various samples. The 
samples in Panels 1 .4, v 1 .5 and v 1 .6 are broken into 2 classes: samples derived from cultured 
cell lines and samples derived from primary normal tissues. The cell lines are derived from 

10 cancers of the following types: lung cancer, breast cancer, melanoma, colon cancer, prostate 
cancer, CNS cancer, squamous cell carcinoma, ovarian cancer, liver cancer, renal cancer, 
gastric cancer and pancreatic cancer. Cell lines used in Panels 1.4, vl.5 and vl .6 are widely 
available through the American Type Culture Collection (ATCC), a repository for cultured 
cell lines, and were cultured using the conditions recommended by the ATCC. The normal 

1 5 tissues found on Panels 1 .4, v 1 .5 and v 1 .6 are comprised of pools of samples derived from all 
major organ systems from 2 to 5 different adult individuals or fetuses. These samples are 
derived from the following organs: adult skeletal muscle, fetal skeletal muscle, adult heart, 
fetal heart, adult kidney, fetal kidney, adult liver, fetal liver, adult lung, fetal lung, various 
regions of the brain, the spleen, bone marrow, lymph node, pancreas, salivary gland, pituitary 

20 gland, adrenal gland, spinal cord, thymus, stomach, small intestine, colon, bladder, trachea, 
breast, ovary, uterus, placenta, prostate, testis and adipose. Abbreviations are as described for 
Panels 1, 1.1, 1.2, and 1.3D. 

Panels 2D, 2.2, 2.3 and 2.4 

The plates for Panels 2D, 2.2, 2.3 and 2.4 generally include two control wells and 94 
25 test samples composed of RNA or cDNA isolated from human tissue procured by surgeons 
working in close cooperation with the National Cancer Institute's Cooperative Human Tissue 
Network (CHTN) or the National Disease Research Initiative (NDRI) or from Ardais or 
Clinomics. The tissues are derived from human malignancies and in cases where indicated 
many malignant tissues have "matched margins" obtained from noncancerous tissue just 
30 adjacent to the tumor. These are termed normal adjacent tissues and are denoted "NAT" in 
the results below. The tumor tissue and the "matched margins" are evaluated by two 
independent pathologists (the surgical pathologists and again by a pathologist at NDRI/ 



WO 03/023002 




[CT/US02/28539 



CHTN/Ardais/Clinomics). Unmatched RNA samples from tissues without malignancy 
(normal tissues) were also obtained from Ardais or Clinomics. This analysis provides a gross 
histopathological assessment of tumor differentiation grade. Moreover, most samples include 
the original surgical pathology report that provides information regarding the clinical stage of 
5 the patient. These matched margins are taken from the tissue surrounding (i.e. immediately 
proximal) to the zone of surgery (designated "NAT", for normal adjacent tissue, in Table 
RR). In addition, RNA and cDNA samples were obtained from various human tissues derived 
from autopsies performed on elderly people or sudden death victims (accidents, etc.). These 
tissues were ascertained to be free of disease and were purchased from various commercial 
10 sources such as Clontech (Palo Alto, CA), Research Genetics, and Invitrogen. General 
oncology screening panel_v_2.4 is an updated version of Panel 2D. 

HASS Panel v 1.0 

The HASS panel v 1.0 plates are comprised of 93 cDNA samples and two controls. 
Specifically, 81 of these samples are derived from cultured human cancer cell lines that had 

1 5 been subjected to serum starvation, acidosis and anoxia for different time periods as well as 
controls for these treatments, 3 samples of human primary cells, 9 samples of malignant brain 
cancer (4 medulloblastomas and 5 glioblastomas) and 2 controls. The human cancer cell lines 
are obtained from ATCC (American Type Culture Collection) and fall into the following 
tissue groups: breast cancer, prostate cancer, bladder carcinomas, pancreatic cancers and CNS 

20 cancer cell lines. These cancer cells are all cultured under standard recommended conditions. 
The treatments used (serum starvation, acidosis and anoxia) have been previously published 
in the scientific literature. The primary human cells were obtained from Clonetics 
(Walkersville, MD) and were grown in the media and conditions recommended by Clonetics. 
The malignant brain cancer samples are obtained as part of a collaboration (Henry Ford 

25 Cancer Center) and are evaluated by a pathologist prior to CuraGen receiving the samples . 
RNA was prepared from these samples using the standard procedures. The genomic and 
chemistry control wells have been described previously. 

ARDAIS Panel v 1.0 

The plates for ARDAIS panel v 1 .0 generally include 2 control wells and 22 test 
30 samples composed of RNA isolated from human tissue procured by surgeons working in 
close cooperation with Ardais Corporation. The tissues are derived from human lung 
malignancies (lung adenocarcinoma or lung squamous cell carcinoma) and in cases where 
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indicated many malignant samples have "matched margins" obtained from noncancerous 
lung tissue just adjacent to the tumor. These matched margins are taken from the tissue 
surrounding (i.e. immediately proximal) to the zone of surgery (designated "NAT", for 
normal adjacent tissue) in the results below. The tumor tissue and the "matched margins" are 
5 evaluated by independent pathologists (the surgical pathologists and again by a pathologist at 
Ardais). Unmatched malignant and non-malignant RNA samples from lungs were also 
obtained from Ardais. Additional information from Ardais provides a gross histopathological 
assessment of tumor differentiation grade and stage. Moreover, most samples include the 
original surgical pathology report that provides information regarding the clinical state of the 
10 patient. 

Panels 3D and 3.1 

The plates of Panels 3D and 3.1 are comprised of 94 cDNA samples and two control 
samples. Specifically, 92 of these samples are derived from cultured human cancer cell lines, 
2 samples of human primary cerebellar tissue and 2 controls. The human cell lines are 

1 5 generally obtained from ATCC (American Type Culture Collection), NCI or the German 
tumor cell bank and fall into the following tissue groups: Squamous cell carcinoma of the 
tongue, breast cancer, prostate cancer, melanoma, epidermoid carcinoma, sarcomas, bladder 
carcinomas, pancreatic cancers, kidney cancers, leukemias/lymphomas, 
ovarian/uterine/cervical, gastric, colon, lung and CNS cancer cell lines. In addition, there are 

20 two independent samples of cerebellum. These cells are all cultured under standard 

recommended conditions and RNA extracted using the standard procedures. The cell lines in 
panel 3D and 1.3D are of the most common cell lines used in the scientific literature. 
Oncology_cell_line_screeningj>anel_v3.2 is an updated version of Panel 3. The cell lines in 
panel 3D, 3.1, 1.3D and oncology_cellJine_screening_panel_v3.2 are of the most common 

25 cell lines used in the scientific literature. 

Panels 4D, 4R, and 4. ID 

Panel 4 includes samples on a 96 well plate (2 control wells, 94 test samples) 
composed of RNA (Panel 4R) or cDNA (Panels 4D/4. 1 D) isolated from various human cell 
lines or tissues related to inflammatory conditions. Total RNA from control normal tissues 
30 such as colon and lung (Stratagene, La Jolla, CA) and thymus and kidney (Clontech) was 

employed. Total RNA from liver tissue from cirrhosis patients and kidney from lupus patients 
was obtained from BioChain (Biochain Institute, Inc., Hayward, CA). Intestinal tissue for 
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RNA preparation from patients diagnosed as having Crohn's disease and ulcerative colitis 
was obtained from the National Disease Research Interchange (NDRI) (Philadelphia, PA). 

Astrocytes, lung fibroblasts, dermal fibroblasts, coronary artery smooth muscle cells, 
small airway epithelium, bronchial epithelium, microvascular dermal endothelial cells, 
5 microvascular lung endothelial cells, human pulmonary aortic endothelial cells, human 
umbilical vein endothelial cells were all purchased from Clonetics (Walkersville, MD) and 
grown in the media supplied for these cell types by Clonetics. These primary cell types were 
activated with various cytokines or combinations of cytokines for 6 and/or 12-14 hours, as 
indicated. The following cytokines were used; IL-1 beta at approximately l-5ng/ml, TNF 

1 0 alpha at approximately 5-10ng/ml, IFN gamma at approximately 20-50ng/ml, IL-4 at 
approximately 5-10ng/ml, IL-9 at approximately 5-IOng/ml, 1L-13 at approximately 
5-IOng/ml. Endothelial cells were sometimes starved for various times by culture in the basal 
media from Clonetics with 0.1% serum. 

Mononuclear cells were prepared from blood of employees at CuraGen Corporation, 

1 5 using Ficoll. LAK cells were prepared from these cells by culture in DMEM 5% FCS 
(Hyclone), I00|iM non essential amino acids (Gibco/Life Technologies, Rockville, MD), 
ImM sodium pyruvate (Gibco), mercaptoethanol 5.5x1 0°M (Gibco), and lOmM Hepes 
(Gibco) and Interleukin 2 for 4-6 days. Cells were then either activated with IO-20ng/ml 
PMA and \-2\ig/m\ ionomycin, IL-1 2 at 5-IOng/ml, IFN gamma at 20-50ng/ml and 1L-1 8 at 

20 5-IOng/ml for 6 hours. In some cases, mononuclear cells were cultured for 4-5 days in 
DMEM 5% FCS (Hyclone), IOO^iM non essential amino acids (Gibco), ImM sodium 
pyruvate (Gibco), mercaptoethanol 5.5x1 0" 5 M (Gibco), and lOmM Hepes (Gibco) with PHA 
(phytohemagglutinin) or PWM (pokeweed mitogen) at approximately 5^g/ml. Samples were 
taken at 24, 48 and 72 hours for RNA preparation. MLR (mixed lymphocyte reaction) 

25 samples were obtained by taking blood from two donors, isolating the mononuclear cells 
using Ficoll and mixing the isolated mononuclear cells 1:1 at a final concentration of 
approximately 2xl0 6 cells/ml in DMEM 5% FCS (Hyclone), lOOpM non essential amino 
acids (Gibco), ImM sodium pyruvate (Gibco), mercaptoethanol (5.5xI0" 5 M) (Gibco), and 
lOmM Hepes (Gibco). The MLR was cultured and samples taken at various time points 

30 ranging from I - 7 days for RNA preparation. 

Monocytes were isolated from mononuclear cells using CD 1 4 Miltenyi Beads, +ve 
VS selection columns and a Vario Magnet according to the manufacturer's instructions. 
Monocytes were differentiated into dendritic cells by culture in DMEM 5% fetal calf serum 
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(FCS) (Hyclone, Logan, UT), IOOhM non essential amino acids (Gibco), ImM sodium 
pyruvate (Gibco), mercaptoethanol 5.5x1 0" 5 M (Gibco), and lOmM Hepes (Gibco), 50ng/mf 
GMCSF and 5ng/ml IL-4 for 5-7 days. Macrophages were prepared by culture of monocytes 
for 5-7 days in DMEM 5% FCS (Hyclone), lOO^M non essential amino acids (Gibco), ImM 
5 sodium pyruvate (Gibco), mercaptoethanol 5.5x1 0" 5 M (Gibco), lOmM Hepes (Gibco) and 
10% AB Human Serum or MCSF at approximately 50ng/ml. Monocytes, macrophages and 
dendritic cells were stimulated for 6 and 12-14 hours with lipopolysaccharide (LPS) at 
lOOng/ml. Dendritic cells were also stimulated with anti-CD40 monoclonal antibody 
(Pharmingen) at lO^g/ml for 6 and 12-14 hours. 

10 CD4 lymphocytes, CD8 lymphocytes and NK cells were also isolated from 

mononuclear cells using CD4, CD8 and CD56 Miltenyi beads, positive VS selection columns 
and a Vario Magnet according to the manufacturer's instructions. CD45RA and CD45RO 
CD4 lymphocytes were isolated by depleting mononuclear cells of CD8, CD56, CD 1 4 and 
CD 19 cells using CD8, CD56, CD 14 and CD1 9 Miltenyi beads and positive selection. 

15 CD45RO beads were then used to isolate the CD45RO CD4 lymphocytes with the remaining 
cells being CD45RA CD4 lymphocytes. CD45RA CD4, CD45RO CD4 and CD8 
lymphocytes were placed in DMEM 5% FCS (Hyclone), I00fiM non essential amino acids 
(Gibco), ImM sodium pyruvate (Gibco), mercaptoethanol 5.5x1 0°M (Gibco), and lOmM 
Hepes (Gibco) and plated at 10 6 cells/rnl onto Falcon 6 well tissue culture plates that had been 

20 coated overnight with 0.5^g/ml anti-CD28 (Pharmingen) and 3ug/ml anti-CD3 (OKT3, 
ATCC) in PBS. After 6 and 24 hours, the cells were harvested for RNA preparation. To 
prepare chronically activated CD8 lymphocytes, we activated the isolated CD8 lymphocytes 
for 4 days on anti-CD28 and anti-CD3 coated plates and then harvested the cells and 
expanded them in DMEM 5% FCS (Hyclone), lOOftM non essential amino acids (Gibco), 

25 ImM sodium pyruvate (Gibco), mercaptoethanol 5.5x1 0' 5 M (Gibco), and lOmM Hepes 
(Gibco) and IL-2. The expanded CD8 cells were then activated again with plate bound 
anti-CD3 and anti-CD28 for 4 days and expanded as before. RNA was isolated 6 and 24 
hours after the second activation and after 4 days of the second expansion culture. The 
isolated NK cells were cultured in DMEM 5% FCS (Hyclone), 100nM non essential amino 

30 acids (Gibco), ImM sodium pyruvate (Gibco), mercaptoethanol 5.5xI0" 5 M (Gibco), and 
lOmM Hepes (Gibco) and IL-2 for 4-6 days before RNA was prepared. 

To obtain B cells, tonsils were procured from NDR1. The tonsil was cut up with 
sterile dissecting scissors and then passed through a sieve. Tonsil ceils were then spun down 
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and resupended at I O 6 cells/ml in DMEM 5% FCS (Hyclone), IOO|iM non essential amino 
acids (Gibco), ImM sodium pyruvate (Gibco), mercaptoethanol 5.5x1 0°M (Gibco), and 
lOmM Hepes (Gibco). To activate the cells, we used PWM at 5ng/ml or anti-CD40 
(Pharmingen) at approximately lOfig/ml and IL-4 at 5-10ng/ml, Cells were harvested for 
5 RNA preparation at 24, 48 and 72 hours. 

To prepare the primary and secondary Thl/Th2 and Trl cells, six-well Falcon plates 
were coated overnight with I0ng/ml anti-CD28 (Pharmingen) and 2ng/ml OKT3 (ATCC), 
and then washed twice with PBS. Umbilical cord blood CD4 lymphocytes (Poietic Systems, 
German Town, MD) were cultured at 10 5 -10 6 cells/ml in DMEM 5% FCS (Hyclone), 100|iM 

10 non essential amino acids (Gibco), ImM sodium pyruvate (Gibco), mercaptoethanol 

5.5xl0* 5 M (Gibco), lOmM Hepes (Gibco) and IL-2 (4ng/ml). IL-12 (5ng/ml) and anti-IL4 
(I ng/ml) were used to direct to Th I , while IL-4 (5ng/ml) and anti-IFN gamma (I ng/ml) were 
used to direct to Th2 and IL-10 at 5ng/ml was used to direct to Trl. After 4-5 days, the 
activated Thl, Th2 and Trl lymphocytes were washed once in DMEM and expanded tor 4-7 

15 days in DMEM 5% FCS (Hyclone), IOOjiM non essential amino acids (Gibco), ImM sodium 
pyruvate (Gibco), mercaptoethanol 5.5x1 0°M (Gibco), lOmM Hepes (Gibco) and IL-2 
(Ing/ml). Following this, the activated Thl, Th2 and Trl lymphocytes were re-stimulated for 
5 days with anti-CD28/OKT3 and cytokines as described above, but with the addition of 
anti-CD95L (l|ig/ml) to prevent apoptosis. After 4-5 days, the Thl , Th2 and Trl 

20 lymphocytes were washed and then expanded again with IL-2 for 4-7 days. Activated Thl 
and Th2 lymphocytes were maintained in this way for a maximum of three cycles. RNA was 
prepared from primary and secondary Th 1 , Th2 and Trl after 6 and 24 hours following the 
second and third activations with plate bound anti-CD3 and anti-CD28 mAbs and 4 days into 
the second and third expansion cultures in Interleukin 2. 

25 The following leukocyte cells lines were obtained from the ATCC: Ramos, EOL-I, 

KU-812. EOL cells were further differentiated by culture in O.lmM dbcAMP at 
5xl0 5 cells/ml for 8 days, changing the media every 3 days and adjusting the cell 
concentration to SxlO^cel Is/ml. For the culture of these cells, we used DMEM or RPMI (as 
recommended by the ATCC), with the addition of 5% FCS (Hyclone), lOO^M non essential 

30 amino acids (Gibco), ImM sodium pyruvate (Gibco), mercaptoethanol 5.5xlO°M (Gibco), 
lOmM Hepes (Gibco). RNA was either prepared from resting cells or cells activated with 
PMAat lOng/mland ionomycin at l^ig/ml for 6 and 14 hours. Keratinocyte line CCD 1 06 and 
an airway epithelial tumor line NCI-H292 were also obtained from the ATCC. Both were 
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cultured in DMEM 5% FCS (Hyclone), lOO^iM non essential amino acids (Gibco), ImM 
sodium pyruvate (Gibco), mercaptoethanol 5.5x1 0°M (Gibco), and lOmM Hepes (Gibco). 
CCD 1 106 cells were activated for 6 and 14 hours with approximately 5 ng/ml TNF alpha and 
Ing/ml IL-1 beta, while NCI-H292 cells were activated for 6 and 14 hours with the following 
5 cytokines: 5ng/ml 1L-4, 5ng/ml IL-9, 5ng/ml IL-1 3 and 25ng/ml IFN gamma. 

For these cell lines and blood cells, RNA was prepared by lysing approximately 
I0 7 cells/ml using Trizol (Gibco BRL). Briefly, 1/10 volume of bromochloropropane 
(Molecular Research Corporation) was added to the RNA sample, vortexed and after 10 
minutes at room temperature, the tubes were spun at 14,000 rpm in a Sorvall SS34 rotor. The 

10 aqueous phase was removed and placed in a 15ml Falcon Tube. An equal volume of 

isopropanol was added and left at-20°C overnight. The precipitated RNA was spun down at 
9,000 rpm for 15 min in a Sorvall SS34 rotor and washed in 70% ethanol. The pellet was 
redissolved in 300|aI of RNAse-free water and 35^1 buffer (Promega) 5^1 DTT, 7^1 RNAsin 
and 8^1 DNAse were added. The tube was incubated at 37 °C for 30 minutes to remove 

1 5 contaminating genomic DNA, extracted once with phenol chloroform and re-precipitated 

with 1/10 volume of 3M sodium acetate and 2 volumes of 100% ethanol. The RNA was spun 
down and placed in RNAse free water. RNA was stored at -80 °C. 

Alcomprehensivc panel_vl.O 

The plates for AI_comprehensive panel_vl.0 include two control wells and 89 test 
20 samples comprised of cDNA isolated from surgical and postmortem human tissues obtained 
from the Backus Hospital and Clinomics (Frederick, MD). Total RNA was extracted from 
tissue samples from the Backus Hospital in the Facility at CuraGen. Total RNA from other 
tissues was obtained from Clinomics. 

Joint tissues including synovial fluid, synovium, bone and cartilage were obtained 
25 from patients undergoing total knee or hip replacement surgery at the Backus Hospital. 

Tissue samples were immediately snap frozen in liquid nitrogen to ensure that isolated RNA 
was of optimal quality and not degraded. Additional samples of osteoarthritis and rheumatoid 
arthritis joint tissues were obtained from Clinomics. Normal control tissues were supplied by 
Clinomics and were obtained during autopsy of trauma victims. 
30 Surgical specimens of psoriatic tissues and adjacent matched tissues were provided as 

total RNA by Clinomics. Two male and two female patients were selected between the ages 
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of 25 and 47. None of the patients were taking prescription drugs at the time samples were 
isolated. 

Surgical specimens of diseased colon from patients with ulcerative colitis and Crohns 
disease and adjacent matched tissues were obtained from Clinomics. Bowel tissue from three 
5 female and three male Crohn's patients between the ages of 41-69 were used. Two patients 
were not on prescription medication while the others were taking dexamethasone, 
phenobarbital, or tylenol. Ulcerative colitis tissue was from three male and four female 
patients. Four of the patients were taking lebvid and two were on phenobarbital. 



10 emphysema, asthma or COPD was purchased from Clinomics. Emphysema patients ranged in 
age from 40-70 and all were smokers, this age range was chosen to focus on patients with 
cigarette-linked emphysema and to avoid those patients with alpha-lanti-trypsin deficiencies. 
Asthma patients ranged in age from 36-75, and excluded smokers to prevent those patients 
that could also have COPD. COPD patients ranged in age from 35-80 and included both 

15 smokers and non-smokers. Most patients were taking corticosteroids, and bronchodilators. 

In the labels employed to identify tissues in the AI_comprehensive panel_vl.O panel, 
the following abbreviations are used: 



Total RNA from post mortem lung tissue from trauma victims with no disease or with 



25 



20 



AI = Autoimmunity 

Syn = Synovial 

Normal = No apparent disease 

Rep22 /Rep20 = individual patients 

RA = Rheumatoid arthritis 

Backus = From Backus Hospital 

OA = Osteoarthritis 

(SS) (BA) (MF) - Individual patients 

Adj = Adjacent tissue 

Match control = adjacent tissues 



-M = Male 



-F = Female 



30 



COPD = Chronic obstructive pulmonary disease 
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Panels 5D and 51 

The plates for Panel 5D and 51 include two control wells and a variety of cDNAs 
isolated from human tissues and cell lines with an emphasis on metabolic diseases. Metabolic 
tissues were obtained from patients enrolled in the Gestational Diabetes study. Cells were 
5 obtained during different stages in the differentiation of adipocytes from human 
mesenchymal stem cells. Human pancreatic islets were also obtained. 

In the Gestational Diabetes study subjects are young (18-40 years), otherwise 
healthy women with and without gestational diabetes undergoing routine (elective) Caesarean 
section. After delivery of the infant, when the surgical incisions were being repaired/closed, 
10 the obstetrician removed a small sample (<1 cc) of the exposed metabolic tissues during the 
closure of each surgical level. The biopsy material was rinsed in sterile saline, blotted and 
fast frozen within 5 minutes from the time of removal. The tissue was then flash frozen in 
liquid nitrogen and stored, individually, in sterile screw-top tubes and kept on dry ice for 
shipment to or to be picked up by CuraGen. The metabolic tissues of interest include uterine 
1 5 wall (smooth muscle), visceral adipose, skeletal muscle (rectus) and subcutaneous adipose. 
Patient descriptions are as follows: 

Patient 2 Diabetic Hispanic, overweight, not on insulin 
Patient 7-9 Nondiabetic Caucasian and obese (BMI>30) 
Patient 10 Diabetic Hispanic, overweight, on insulin 
20 Patient 1 1 Nondiabetic African American and overweight 

Patient 12 Diabetic Hispanic on insulin 
Adipocyte differentiation was induced in donor progenitor cells obtained from Osirus 
(a division of Clonctics/BioWhittaker) in triplicate, except for Donor 3U which had only two 
replicates. Scientists at Clonetics isolated, grew and differentiated human mesenchymal stem 
25 cells (HuMSCs) for CuraGen based on the published protocol found in Mark F. Pittenger, et 
al. s Multilineage Potential of Adult Human Mesenchymal Stem Cells Science Apr 2 1999: 
143-147. Clonetics provided Trizol lysates or frozen pellets suitable for mRNA isolation and 
ds cDNA production. A general description of each donor is as follows: 

Donor 2 and 3 U: Mesenchymal Stem cells, Undifferentiated Adipose 
30 Donor 2 and 3 AM: Adipose, AdiposeMidway Differentiated 

Donor 2 and 3 AD: Adipose, Adipose Differentiated 
Human cell lines were generally obtained from ATCC (American Type Culture 
Collection), NCI or the German tumor cell bank and fall into the following tissue groups: 
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kidney proximal convoluted tubule, uterine smooth muscle cells, small intestine, liver HepG2 
cancer cells, heart primary stromal cells, and adrenal cortical adenoma cells. These cells are 
all cultured under standard recommended conditions and RNA extracted using the standard 
procedures. All samples were processed at CuraGen to produce single stranded cDNA. 
5 Panel 51 contains all samples previously described with the addition of pancreatic 

islets from a 58 year old female patient obtained from the Diabetes Research Institute at the 
University of Miami School of Medicine. Islet tissue was processed to total RNA at an 
outside source and delivered to CuraGen for addition to panel 51. 

In the labels employed to identify tissues in the 5D and 51 panels, the following 
10 abbreviations are used: 

GO Adipose = Greater Omentum Adipose 
SK = Skeletal Muscle 
UT= Uterus 
PL = Placenta 
1 5 AD = Adipose Differentiated 

AM = Adipose Midway Differentiated 
U = Undifferentiated Stem Cells 

Panel CNSD.01 

The plates for Panel CNSD.01 include two control wells and 94 test samples 
20 comprised of cDNA isolated from postmortem human brain tissue obtained from the Harvard 
Brain Tissue Resource Center. Brains are removed from calvaria of donors between 4 and 24 
hours after death, sectioned by neuroanatomists, and frozen at -80°C in liquid nitrogen vapor. 
All brains are sectioned and examined by neuropathologists to confirm diagnoses with clear 
associated neuropathology. 
25 Disease diagnoses are taken from patient records. The panel contains two brains from 

each of the following diagnoses: Alzheimer's disease, Parkinson's disease, Huntington's 
disease, Progressive Supernuclear Palsy, Depression, and "Normal controls". Within each of 
these brains, the following regions are represented: cingulate gyrus, temporal pole, globus 
palladus, substantia nigra, Brodman Area 4 (primary motor strip), Brodman Area 7 (parietal 
30 cortex), Brodman Area 9 (prefrontal cortex), and Brodman area 1 7 (occipital cortex). Not all 
brain regions are represented in all cases; e.g., Huntington's disease is characterized in part by 
neurodegeneration in the globus palladus, thus this region is impossible to obtain from 
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confirmed Huntington's cases. Likewise Parkinson's disease is characterized by degeneration 
of the substantia nigra making this region more difficult to obtain. Normal control brains 
were examined for neuropathology and found to be free of any pathology consistent with 
neurodegeneration. 

5 In the labels employed to identify tissues in the CNS panel, the following 

abbreviations are used: 

PSP = Progressive supranuclear palsy 
Sub Nigra = Substantia nigra 
Glob Palladus= Globus palladus 
1 0 Temp Pole = Temporal pole 

Cing Gyr = Cingulate gyrus 
BA 4 = Brodman Area 4 

Panel CNSJVeurodegenerationVl.O 

The plates for Panel CNS_Neurodegeneration_Vl .0 include two control wells and 47 

1 5 test samples comprised of cDNA isolated from postmortem human brain tissue obtained from 
the Harvard Brain Tissue Resource Center (McLean Hospital) and the Human Brain and 
Spinal Fluid Resource Center (VA Greater Los Angeles Healthcare System). Brains are 
removed from caivaria of donors between 4 and 24 hours after death, sectioned by 
neuroanatomists, and frozen at -80°C in liquid nitrogen vapor. All brains are sectioned and 

20 examined by neuropathologists to confirm diagnoses with clear associated neuropathology. 

Disease diagnoses are taken from patient records. The panel contains six brains from 
Alzheimer's disease (AD) patients, and eight brains from "Normal controls" who showed no 
evidence of dementia prior to death. The eight normal control brains are divided into two 
categories: Controls with no dementia and no Alzheimer's like pathology (Controls) and 

25 controls with no dementia but evidence of severe Alzheimer's like pathology, (specifically 
senile plaque load rated as level 3 on a scale of 0-3; 0 = no evidence of plaques, 3 = severe 
AD senile plaque load). Within each of these brains, the following regions are represented: 
hippocampus, temporal cortex (Brodman Area 21), parietal cortex (Brodman area 7), and 
occipital cortex (Brodman area 1 7). These regions were chosen to encompass all levels of 

30 neurodegeneration in AD. The hippocampus is a region of early and severe neuronal loss in 
AD; the temporal cortex is known to show neurodegeneration in AD after the hippocampus; 
the parietal cortex shows moderate neuronal death in the late stages of the disease; the 
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occipital cortex is spared in AD and therefore acts as a "control" region within AD patients. 
Not all brain regions are represented in all cases. 

In the labels employed to identify tissues in the CNS_Neurodegeneration_VI .0 panel, 
the following abbreviations are used: 
5 AD = Alzheimer's disease brain; patient was demented and showed AD-like 

pathology upon autopsy 

Control = Control brains; patient not demented, showing no neuropathology 
Control (Path) = Control brains; pateint not demented but showing sever AD-like 
pathology 

1 0 SupTemporal Ctx = Superior Temporal Cortex 

Inf Temporal Ctx = Inferior Temporal Cortex 

A. NOV7a: Hsapiens CAB3 

Expression of gene NOV7a was assessed using the primer-probe set Ag4264, 0 

1 5 described in Table AA. Results of the RTQ-PCR runs are shown in Tables AB and AC. 
Table AA. Probe Name Ag4264 



i 1 

Primers j Sequences 


Length 


Start Position 


SEQ ID 

No 


Forward ; 5 '-gcatcctgtcatttctgttc tt-3 * 


22 


2294 ]345 


Probe jTET-5'-tccctcatacatctttggagaaccgg-3'-TAMRA 


26 


2317 (346 


Reverse j5'-cagctatgagtcagggaacaaa-3' 


22 


2352 


347 



Table AB. General screening panel vl.4 



Tissue Name 


Rel. 

Exp.(%) 
Ag4264, 
Run 

222171301 


Tissue Name 


Rel. 

Exp.(%) 
Ag4264, 
Run 

222171301 


Adipose 


4.0 


Renal ca. TK-10 


59.9 


iMelanoma* Hs688(A).T 


24.7 


Bladder 


7.7 


iMelanoma* Hs688(B).T 


18.7 


Gastric ca. (liver met.) NCI-N87 


49.3 


Melanoma* M!4 


31.2 Gastric ca. KATO III 


36.9 


Melanoma* LOXIMVI 


22.7 


Colon ca. SW-948 


15.1 


Melanoma* SK-MEL-5 


23.7 


Colon ca. SW480 


79.0 


Squamous cell carcinoma SCC-4 


8.1 


Colon ca.* (SW480 met) SW620 


10.2 


Testis Pool 


3.1 


Colon ca. HT29 


31.4 | 


Prostate ca.* (bone met) PC-3 34.2 


Colonca.HCT-II6 


41.5 


Prostate Pool 4.6 


Colon ca. CaCo-2 


25.7 
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^Placenta 


4.1 


Colon cancer tissue 


31.6 


Uterus Pool 


8.7 


Colon ca. SW1 116 


12.3 


iOarian ca. OVCAR-3 


79.0 


Colon ca. Colo-205 


2.8 


(Ovarian ca. SK-OV-3 


40.3 


Colon ca. SW-48 


14.3 


jOvarian ca. OVCAR-4 


13.6 


Colon Pool 


55.9 


lOvarian ca. OVCAR-5 


43.2 


Small Intestine Pool 


27.4 


•Ovarian ca. IGROV-1 


33.7 


Stomach Pool 


22.4 


lOvarian ca. OVCAR-8 


49.7 


Bone Marrow Pool 


15.6 


.Ovary 


19.9 jFetal Heart 


1.5 


fereast ca. MCF-7 


48.6 (Heart Pool 


9.9 


Breast ca. MDA-MB-23 1 


8.1 (Lymph Node Pool 


49.7 


Breast ca. BT 549 


85.3 jFetal Skeletal Muscle 


2.1 


Breast ca. T47D 


100.0 (Skeletal Muscle Pool 


1.1 


|Breast ca. MDA-N 


1L7 JSpleenPool 


1.9 


i Breast Pool 


35.6 jThymus Pool 


19.9 


jTrachea 


4.7 jCNS cancer (glio/astro) U87-MG 


F9.2 1 


•Lune 


33.9 jCNS cancer (glio/astro) U-l 18-MG 


59.0 


[Fetal Lung 


1 1 .0 ]CNS cancer (neuro;met) SK-N-AS j85.9 


iLung ca. NCI-N4I7 


2.7 jCNS cancer (astro) SF-539 


15.9 


Lung ca. LX-1 


52.5 CNS cancer (astro) SNB-75 


35.6 


Lung ca. NC1-HI46 


34.6 JCNS cancer (glio) SNB-19 


38.7 | 


Lung ca. SHP-77 


20.9 jCNS cancer (glio) SF-295 


67.8 


Lung ca. A549 


3 1 .0 jBrain (Amygdala) Pool |27.9 


Lungca. NCI-H526 


8.1 jBrain (cerebellum) 


47.0 


Lungca. NCI-H23 


50.0 (Brain (fetal) ; 


63.3 


Lungca. NCI-H460 


12.1 JBrain (Hippocampus) Pool 


22.5 


Lung ca. HOP-62 


2^4 (Cerebral Cortex Pool {42.9 


Lung ca. NCI-H522 


92.0 jBrain (Substantia nigra) Pool 


43.5 


Liver 


0.2 jBrain (Thalamus) Pool 


38.7 


Fetal Liver 


0.7 JBrain (whole) 


35.1 


Liver ca. HepG2 


0.5 jSpinal Cord Pool 


3.5 


•Kidney Pool 


53.2 jAdrenal Gland 


1.4 


[Fetal Kidney 


8.5 (Pituitary gland Pool 


2.3 


;Renal ca. 786-0 


66.9 jSalivary Gland 


1.4 


Renal ca. A498 


12.0 (Thyroid (female) 


5.5 


Renal ca. ACHN 


56.3 (Pancreatic ca. CAPAN2 


29.1 


Renal ca. UO-31 


70.7 


Pancreas Pool 


42.3 



Tabic AC. Panel 5 Islet 
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J 

iTissueName 

i 

j 

i 


ReL 

Exp.(%) 
Ag4264, 
Run 

181325881 


Tissue Name 


Rel. 

Exp.(%) 
Ag4264, 
Run 

181325881 


97457_Patient-02go_adipose 


14.7 


94709_Donor 2 AM - A_adipose 


32.1 


;97476_Patient-07sk_skeletal 
imuscle 


13.3 


94710 Donor 2 AM - B adipose 


14.1 


197477 Patient-07ut uterus 


26.4 


9471 1 Donor 2 AM - C_adipose 


7.0 


j97478J>atient-07pLplacenta 


8.6 


94712_Donor 2 AD - A_adipose 


18.4 


;99l67_Bayer Patient 1 


100.0 


94713_Donor 2 AD - B_adipose 


25.5 


97482J 5 atient-08ut_uterus 


6.0 


94714_Donor 2 AD - C_adipose 


24.5 


97483_Patient-08pl_placenta 


5.3 


94742 Donor 3 U - A Mesenchymal Stem 
Cells 


5.6 


97486 Patient-09sk_ske!etal 
muscle 


0.0 


94743_Donor 3 U - B_Mesenchymal Stem 
Cells 


12.4 


97487 Patient-09ut uterus 


28.5 


94730_Donor 3 AM - A_adipose 


22.5 


97488_Patient-09pl_placenta 


7.2 


9473I_Donor 3 AM - B adipose 


12.0 


97492_Patient- 1 Out_uterus 


27.7 


94732_Donor 3 AM - C^adipose 


8.0 


97493_Patient- 1 Opljlacenta 


21.0 


94733_Donor 3 AD - A_adipose 


23.3 


97495_Patient-1 Igoadipose 


7.4 


94734_Donor 3 AD - B adipose 


2.4 


97496 J>atient-1 lsk_skeletal 
muscle 


3.0 


94735 Donor 3 AD - C adipose 


12.2 


|97497_Patient-l lut_uterus 


24.0 


77 1 38_Liver_HepG2untreated 


0.0 


f~ 

97498_Patient- 1 1 pl_placenta 


15.2 


73556_Heart_Cardiac stromal cells 
(primary) 


19.9 


97500_Patient-12go_adipose : 


15.6 


81735 Small Intestine 


11.0 


9750 Inpatient- 1 2sk_skeleta! 
muscle 


13 3 


72409_Kidney_Proximal Convoluted 
Tubule 




97502_Patient- 1 2ut_uterus 


36.6 


82685_Small intestine_Duodenum 


6.0 


97503_PatienM2pl_placenta : 


7.4 


90650 Adrenal Adrenocortical adenoma 


1.3 


94721_Donor2U- 
A_Mesenchymal Stem Cells 


28.5 


72410_KidneyJ-lRCE 


55.5 


■94722 JDonor2U- 
|B_MesenchymaI Stem Cells 


18.9 


7241l_Kidney_HRE 


52.5 


94723_Donor2U- 
;C_Mesenchymal Stem Cells 


43.2 


73139_UterusJJterine smooth muscle cells 


27.4 



General_screening_panel_vl.4 Summary: Ag4264 Highest expression of this gene 
is detected in breast cancer T47D cell line (CT=27). High to moderate levels of expression of 
this gene is also seen in cluster of cancer cell lines derived from pancreatic, gastric, colon, 
5 lung, liver, renal, breast, ovarian, prostate, squamous cell carcinoma, melanoma and brain 
cancers. Thus, expression of this gene could be used as a marker to detect the presence of 
these cancers. Furthermore, therapeutic modulation of the expression or function of this gene 
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may be effective in the treatment of pancreatic, gastric, colon, lung, liver, renal, breast, 
ovarian, prostate, squamous cell carcinoma, melanoma and brain cancers. 

Among tissues with metabolic or endocrine function, this gene is expressed at 
moderate levels in pancreas, adipose, adrenal gland, thyroid, pituitary gland, skeletal muscle, 
5 heart, fetal liver and the gastrointestinal tract. Therefore, therapeutic modulation of the 
activity of this gene may prove useful in the treatment of endocrine/metabolically related 
diseases, such as obesity and diabetes. 

In addition, this gene is expressed at moderate levels in all regions of the central 
nervous system examined, including amygdala, hippocampus, substantia nigra, thalamus, 
10 cerebellum, cerebral cortex, and spinal cord. Therefore, therapeutic modulation of this gene 
product may be useful in the treatment of central nervous system disorders such as 
Alzheimer's disease, Parkinson's disease, epilepsy, multiple sclerosis, schizophrenia and 
depression. 

Panel 5 Islet Summary: Ag4264 Highest expression of this gene is detected in islet 
1 5 cells (CT=32.3). Moderate to low levels of expression of this gene is also seen in uterus, 
mesenchymal stem cells, adipose and kidney. 

This gene codes for the L-type calcium channel beta-3 subunit. The beta subunit of 
voltage-dependent calcium channels contributes to the function of the calcium channel by 
increasing peak calcium current, shifting the voltage dependencies of activation and 
20 inactivation, modulating G protein inhibition and controlling the alpha-1 subunit membrane 
targeting. Therefore, therapeutic modulation of this gene may be useful as a treatment for the 
enhancement of insulin secretion in Type 2 diabetes. 

B. NOV9b: BHLH PROTEIN DEC2 

25 Expression of full-length physical clone NOV9b was assessed using the primer-probe 

set Ag6927, described in Table BA. Results of the RTQ-PCR runs are shown in Table BB. 
Table BA. Probe Name Ag6927 



i 
I 

jPrimers 


Sequences 


Length jstart Position 


SEQ ID 

No 


;For\vard 


5'-acccagttcctgcccttc-3' 


18 1517 


348 


iProbe 

• 


TET-5'-ctctgctcaagaagatccctcgcagc-3'-TAMRA 


26 567 


349 


(Reverse 


5'-gcaaggattcagggagctt-3' 


19 602 


350 



30 



Table BB. General screening panel vl.6 
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Tissue Name 


Rei. { 
Exp.(%) j 

Ag6927, Tissue Name 
Run J 
278700378 j 


(Rel. 

;Exp.(%) 

!A g 6927, 
[Run 

1278700378 


AH innsp 


4.5 


| Renal ca. TK-10 


12.5 


Mplannma* T-k6KRf A^ T 


8.5 


[Bladder j 1 8.4 


Melanoma* HsfSRKf R^i T 


5.5 


jGastric ca. (liver met.) NC1-N87 


23.3 


Mplannma* M14 


0.0 


(Gastric ca. KATO III 


30.4 


Mplannma* 1 OYIMVI 
iviciallUllla LA_//\lIvi v 1 


11 


jColon ca. S W-948 


11.4 


Mplannma* <sJC-lv1FI 


22.5 


jcolon ca. SW480 


2.4 


oqUalllUUD LC1I LaTLIilUIrla 


10.0 


(Colon ca.* (SW480 met) SW620 


0.0 


TpqHc Ponl 

I vollo I SJXJl 


6.6 


[Colon ca. HT29 


41.2 


Pmctatp * fhrmp mf>t\ PP 1 
iIUMaLC La. ^UUIIC IIICIJ r ^-_> 


22.5 


(Colon ca.HCT-116 


14.4 


Prn<itatp Pnnl 


2.5 


jColon ca. CaCo-2 


10.0 


Ptappntu 
l lavvi l id 


4.8 


jColon cancer tissue 


14.9 


1 Itpmc Pnnl 
vjici ub ruui 


1.7 


Colon ca.SWl 116 


:0.0 


Ovarian ra OVPAR-** 


100.0 


|Colon ca. Coio-205 


0.0 


Ovarian ra ^K-OV-l 


5.1 


(Colon ca. SW-48 


2.5 


Wval lall la. V V_^/\tV~H 


0.0 


(Colon Pool 


9.9 


Ovarian ra OVPAR-^ 


46.7 


(Small Intestine Pool 


976 


Ovarian ra IOROV-1 
wcu icii 1 La. ivjrvL^' v i 


9.5 


(Stomach Pool 


~ ■ 

3.9 


Ovarian ra OVfAR-R 
w vol lall La. \J V V^/AIn. O 


14.6 


jBone Marrow Pool 


8.1 


Ovarv 
\j vai 


3.1 


jFetal Heart 


2.1 


Breast ra MCF-7 


0.0 


jHeart Pool 


1.4 


Breast ra MDA-MR-2^ I 


12.4 


(Lymph Node Pool 


2.3 


Breast ra RT S4Q 

uitcui La. L> I vJ"7 


1.9 


jFetal Skeletal Muscle 


4.7 


Breast ca T47D 


0.0 


jSkeletal Muscle Pool 


15.6 


Breast ca MDA-N 


0.0 


(Spleen Pool 


11.0 


Breast Pool 


3,9 


|Thymus Pool 


7.0 


Trachea 


3.1 


jCNS cancer (glio/astro) U87-MG 


43.2 




21.3 


iCNS cancer (glio/astro) U-l 18-MG 


13.5 


Feta! Lunt? 


3.6 


(CNS cancer (neuro;met) SK-N-AS 


1.6 


Lungca. NC1-N417 


0.0 


(CNS cancer (astro) SF-539 


10.2 


Lungca. LX-1 


1.1 


jCNS cancer (astro) SNB-75 


37.9 


Lung ca. NCI-H146 


0.0 


jCNS cancer (glio) SNB-19 


13.0 


Lungca. SHP-77 


0.0 


(CNS cancer (glio) SF-295 


57.4 


Lung ca. A549 


0.0 


(Brain (Amygdala) Pool 


36.6 


Liingca.NCI-H526 


3.1 


(Brain (cerebellum) 


66.4 


Lungca. NCI-H23 


6.3 


j Brai n (fetal) 


1.4 


Lung ca. NCI-H460 


1.6 


jBrain (Hippocampus) Pool 


45.4 


Lung ca. HOP-62 


46.3 


(Cerebral Cortex Pool 


40.6 


Lung ca. NCI-H522 


0.0 


(Brain (Substantia nigra) Pool 


26.2 
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Liver 


0.0 


jBrain (Thalamus) Pool 


60.7 


Fetal Liver 


0.0 


i Brain (whole) 


14.5 • 


Liver ca. HepG2 


1.3 


'Spinal Cord Pool 


89.5 


Kidney Pool j21.9 


jAdrenal Gland 


15.8 


Fetal Kidney 


1.4- 


jPituitary gland Pool 


25.2 


Renal ca. 786-0 


6.3 


'Salivary Gland ]9.3 


Renal ca. A498 


0.0 


jThyroid (female) 


9.9 


Renal ca. ACHN 


12.9 


{Pancreatic ca. CAPAN2 


10.1 


Renal ca.UO-31 


80.7 


j Pancreas Pool 


14.7 



General_screening_panel_vl.6 Summary: Ag6927 Highest expression of this gene 
is detected in a ovarian cancer OVCAR-3 cell line (CT=33). Significant expression of this 
gene is also seen in number of cell lines derived from brain, colon, renal, lung and ovarian 
5 cancers. Therefore, expression of this gene may be used as marker to detect these cancers. 
Furthermore, therapeutic modulation of this gene may be useful in the treatment of these 
cancers. 

In addition, low levels of expression of this gene is also seen in most of the regions of 
the central nervous system examined, including amygdala, hippocampus, thalamus, 
10 cerebellum, cerebral cortex, and spinal cord. Therefore, therapeutic modulation of this gene 
product may be useful in the treatment of central nervous system disorders such as 
Alzheimer's disease, Parkinson's disease, epilepsy, multiple sclerosis, schizophrenia and 
depression. 

C. NOV16a: FORKHEAD PROTEIN 03A 

1 5 Expression of gene NOV 16a was assessed using the primer-probe set Ag3742, 

described in Table CA. Results of the RTQ-PCR runs are shown in Table CB. 
Table CA. Probe Name Ag3742 



Primers 


Sequences 


Length 


Start Position 


SEQ ID 
No 


Forward 
Probe 


5'-agaccctcaaactgacacaaga-3' 


22 ] 


1863 


351 


TET-5'-aaaaccctttgccaaatctgctctca-3'-TAMRA 


26 


1894 


352 


Reverse 


5'-aaacggtatcactgtccacttg-3' 


22 


1921 


353 



20 Table CB. Panel SD 



465 



WO 03/023002 



PCT/US02/28539 



jRel. 

W(%) 

Tissue Name [Ag3742, 

Run 

jl69315028 


Tissue Name 


Rel. 

Exp.(%) 
Ag3742, 
Run 

169315028 


97457_Patient-02go_adipose |57.8 


94709_Donor 2 AM - A_adipose 


44.1 


97476_Patient-07sk_skeletal j 29 9 
muscle j 


94710J3onor 2 AM - B_adipose 

_ _ 


29.9 


97477_Patient-07ut_uterus |20.0 


9471 l_Donor 2 AM - C_adipose 


22.1 


97478_Patient-07pl_placenta ;32.3 


94712_Donor 2 AD - A_adipose 


41.8 


9748 l_Patient-08sk_skeletal 
muscie 


41.5 


94713JDonor 2 AD - B_adipose 


63.7 


07/1 £7 Dotiont HQ nt ntp rue 

y /^oz_r alien i-uoui_utcrus 


14.3 


94714_Donor 2 AD - C_adipose 


55.5 


97483J > atient-08pl_placenta 


. j94742 Donor 3 U - A Mesenchymal Stem 
jCells 


19.9 


Q7/18A Patient HOcL- cl'^l^t'al 
V /HOD rdllcni-UySK SKclclal 

muscle 


. 7 . :94743 Donor 3 U - B Mesenchymal Stem 
jCells 


21.0 




38.2 ]94730_Donor 3 AM - A_adipose 


52.1 


97488_Patient-09pl_placenta 


13.5 J94731 Donor 3 AM - B_adipose 


32.3 0 


97492 Patient- 1 Out uterus 


36.6 {94732_Donor 3 AM - C_adipose 


33.2 


97493_Patient- 1 Opl_placenta 


43.2 j94733_Donor 3 AD - A_adipose 


41.8 


97495_Patient-1 lgo_adipose 


36.6 

, 

66 4 i 


94734_Donor 3 AD - B_adipose 


19.9 
44.1 


97496_Patient-1 lsk_skeletal 
muscle 


94735_Donor 3 AD - C_adipose 


97497J>atient-l lut_uterus 


40.3 |77 1 38_Liver_HepG2untreated 


100.0 


97498_Patient-l lp!_placenta 


17.7 


73556_Heart_Cardiac stromal cells 
(primary) 


14.1 


97500_Patient- 1 2go_adipose 


44.8 


81735_Small Intestine 


47.6 


9750 1_Patient- 12skj>keletal 
muscle 


77 4 j72409JCidneyJ :> roximal Convoluted 
jTubule 


17.8 


97502_Patient- 1 2ut_uterus 


30. 1 ]82685_SmalI intestine_Duodenum 


20.9 


97503_Patient-J2pl_placcnta 


12.2 J90650 Adrenal Adrenocortical adenoma ]6.6 


94721_Donor2U- 
A_Mesenchymal Stem Cells 


23.8 172410 Kidney HRCE 143.5 


94722_Donor2U- 
BJVlesenchymal Stem Cells 


19.5 


724ll_Kidney_HRE 


49.0 


94723_Donor2U- 
CJvlesenchymal Stem Cells 


20.2 


73 l39_Uterus_Uterine smooth muscle cells 


30.6 



Panel 5D Summary: Ag3742 Highest expression of this gene is detected in liver 
HepG2 cell line (CT=28.8). This gene shows a wide expression in tissues with 
metabolic/endocrine function. Moderate to low levels of expression of this gene is seen in 
5 adipose, uterus, placenta, skeletal muscle, small intestine and kidney. Therefore, therapeutic 
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modulation of this gene may be useful in the treatment of endocrine/metabolically related 
diseases, such as obesity and diabetes. 

D. NOV18a: kinectin like 

Expression of gene NOV 1 8a was assessed using the primer-probe set Ag6564, 

5 described in Table DA. Results of the RTQ-PCR runs are shown in Tables DB and DC. 
Table DA. Probe Name Ag6564 



i . 

| Primers 


Sequences 


Length 


Start Position 


SEQID 
No 


| Forward 


S-ttgcacctgttccattgaatO' 


]20 


359 


354 


|Probe 


TET-S'-tccctaacactacttgaagtttcaacgaO'-TAMRA 


28 


380 


355 


{Reverse 


5'-tcttcaagcacaggcttttg-3' 


20 


430 


356 



Table DB. AI comprehensive panel vl.O 



r — 

1 

| 

jTissue Name 

i 

i 
t 

i 


Rel. 

Exp.(%) 
Ag6564, 
Run 

277314696 


Tissue Name 


Rel. 

Exp.(%) 
Ag6564, 
Run 

277314696 


ill 0967 COPD-F 


19.2 


1 12427 Match Control Psoriasis-F 


100.0 


»Tl0980 COPD-F 


20.4 


1 12418 Psoriasis-M 


5.2 


1110968 COPD-M 


27.0 


1 12723 Match Control Psoriasis-M 


13.1 


1 110977 COPD-M 


58.2 


1 12419 Psoriasis-M 


14.0 


{110989 Emphysema-F 


44.1 


1 12424 Match Control Psoriasis-M 


16.2 


jl 10992 Emphysema-F 


22.5 


112420 Psoriasis-M 


65.1 


jl 10993 Emphysema-F 


29.7 


1 12425 Match Control Psoriasis-M 


46.7 


jl 10994 Emphysema-F 

I <u.l_^j — : u-i — ■ 


14.3 


104689 (MF) OA Bone-Backus 


33.7 


Jl 10995 Emphysema-F 


40.1 


104690 (MF) Adj "Normal" Bone- 
Backus 


33.0 


jl 10996 Emphysema-F 


8.0 


104691 (MF) OA Synovium-Backus 


30.6 


1 10997 Asthma-M 


4.1 


104692 (BA) OA Cartilage-Backus 


13.9 | 


III 1001 Asthma-F 


32.1 


104694 (BA) OA Bone-Backus 


24.7 


1 11002 Asthma-F 

i 


42.3 


104695 (BA) Adj "Normal" Bone- 
Backus 


32.8 


(111003 Atopic Asthma-F 


38.2 


104696 (BA) OA Synovium-Backus 


26.6 


•11 1004 Atopic Asthma-F 


35.4 


1 04700 (SS) OA Bone-Backus 


26.2 


■111005 Atopic Asthma-F 


24.7 


104701 (SS) Adj "Normal" Bone- 
Backus 


23.5 


jl 11006 Atopic Asthma-F 


6.4 


1 04702 (SS) OA Synovium-Backus 


46.0 


|1 1 14)7 Allergy-M 


24.0 


1 1 7093 OA Cartilage Rep7 


34.4 


M 12347 Allergy-M 


0.0 


1 12672 OA Bone5 


24.5 


|112349 Normal Lung-F 


0.0 


11 2673 OA Synovium5 


14.5 
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: l 1 2357 Normal Lune-F 


48.0 


1 12674 OA Synovial Fluid cells5 


20.2 


\ 12354 Normal Lunp-M 


21.9 jl 17100 OA Cartilage Rep 14 


4.3 


1 12374 Crohns-F 


31.6 jl 12756 OA Bone9 


74.7 


M 12389 Match Control Crohns-F 

l 1 1 KJ J 1 ▼ i U Iw 1 1 V^v V/ till V/ 1 1 \J 1111 -3 1 


22.8 jl 12757 OA Synovium9 


33.4 


1 12375 Crohns-F 


24.1 jl 12758 OA Synovial Fluid Cells9 


17.4 


J 1 1 2732 Match Control Crohns-F 


13.4 jl 17125 RA Cartilage Rep2 


5.1 


j 1 1 2725 Crohns-M 


5.5 j 1 1 3492 Bone2 RA 


14.6 


jl 12387 Match Control Crohns-M 


7.2 jl 13493 Synovium2RA ]5.7 


1112378 Crohns-M 


0.0 jl 13494 Syn Fluid Cells R A 


11.5 


jl 12390 Match Control Crohns-M 


55.9 jll3499Cartilage4RA 


10 R 

I v.O 


112726 Crohns-M 


26.6 


1 13500 Bone4RA 


13.8 


1 12731 Match Control Crohns-M 


21.6 


113501 Synovium4 RA 


9.2 


112380 Ulcer Col-F 


39.8 


113502 Syn Fluid Cells4 RA 


9.9 


1 12734 Match Control Ulcer Col- 

f. . 


i 

42.3 jl 13495 Cartilage3 RA 


9.2 


11 2384 Ulcer Col-F 


49.7 jl 13496 Bone3 RA 


10.6 


1 12737 Match Control Ulcer Col- 
F 


3.2 jl 13497 Synovium3RA 

i 


8.2 


112386 Ulcer Col-F 


19.9 [113498 Syn Fluid Cells3 RA 


13.2 


1 12738 Match Control Ulcer Col- 
F 


] 

8.2 j 1 1 7 1 06 Normal Cartilage Rep20 


1.3 


112381 Ulcer Col-M 


0.0 jl 13663 Bone3 Normal 


0.0 


112735 Match Control Ulcer Col- 
M 


1.1 


1 1 3664 Synovium3 Normal 


0.0 


112382 Ulcer Col-M 


2 1 .3 j 1 1 3665 Syn Fluid Cel ls3 Normal 


0.0 


1 12394 Match Control Ulcer Col- . 
M 


7.2 il 17107 Normal Cartilage Rep22 

i 


5.3 


jl 12383 Ulcer Col-M 


34.9 J 13667 Bone4 Normal 


17.3 
18.9 


1 1 2736 Match Control Ulcer Col- 
M 


3.9 1 1 1 3668 Synovium4 Normal 


1 12423 Psoriasis-F 


17.8 jl 13669 Syn Fluid Cells4 Normal J25.3 



Table DC. General screening panel vl.6 



r - ■ — 

Tissue Name 


Rel. 

Exp.(%) 
Ag6564, 
Run 

277243357 


Tissue Name 


Rel. 

Exp.(%) 
Ag6S64, 
Run 

277243357 


Adipose 


19.6 


Renal ca. TK-10 


31.9 


Melanoma* Hs688(A).T 


19.9 


Bladder 


30.8 


Melanoma* Hs688(B).T 


21.5 


Gastric ca. (liver met.) NCI-N87 


53.2 


Melanoma* M14 


8.4 


Gastric ca. KATO III 


39.2 


Melanoma* LOXIMVI 


15.0 


Colon ca. SW-948 


6.2 


Melanoma* SK-MEL-5 


35.1 


Colon ca. SW480 


47.0 
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Squamous cell carcinoma SCC-4 


[39.0 


Colon ca * CSW480 met) SW620 


; 22 1 


Testis Pool 


|25.0 


Colon ca HT29 


jl 1.8 


Prostate ca.* (bone metl PC-3 


|87.1 


Colon ca HCT-1 16 


|26.8 


Prostate Pool 


|l7.8 


Colon ca. CaCo-2 


|28.9 


Placenta 


1.5 


Colon cancer tissue 


|20.0 


Uterus Pool 


7.5 


Colon ca SW1 116 


16 6 




Ovarian ca. OVCAR-3 


48.3 


Colon ca Colo-205 


|2.6 


Ovarian ca. SK-OV-3 


100.0 


Colon ca SW-48 


|3.0 


Ovarian ca. OVCAR-4 


9.9 


Colon Pool 


^24 5 


Ovarian ca. OVCAR-5 


1 1.3 


Small Intestine Pool 


|21.6 


Ovarian ca. IGROV-1 


15.9 


Stomach Pool 


il 1.0 


Ovarian ca OVCAR-8 

V— * ▼ U4 lull VU « ▼ V—' I VI \ %} 


i^ni^r, 

13.7 


Rone Marrow Pool 

\\j\J\ IV 1 VI (XX 1 \J W 1 WV* ' 


9.9 


Ovary 


7.2 


Fetal Heart 

1 ciai 1 J vol 1 


34.2 


Breast ca MCF-7 


32 8 


Heart Pool 

1 1 veil I I UUI 


174' 


Breast ca. MDA-MB-23 1 


67.8 


I vmnh Node Pool 


|29.3 


Breast ca BT 549 


94.6 


Fetal Skeletal Muscle 


112 9 


Breast ca. T47D 


17.8 


Skeletal MikcIp Pool 


|24.8 


Breast ca. MDA-N 


7.9 


Snleen Pool 


115.4 


Breast Pool 


26 8 


Th vm 1 1 c Prvn 1 
i iiyinuo ruui 


•19 9 


Trachea 


10 3 


CNS cancer felio/astro^ U87-MG 


|71.2 


^— — 1 — 1 — 1 — — « 

Lun£ 


9.1 


CNS cancer felio/astro^ U-l 1 8-MG 


J46.0 


Fetal Lunp 


49.7 


CNS cancer (neunvmen SK-N-AS 

veil i v vi i vu i i 1 1 v ij uiv i >( n j 


53.6 


Luneca NCI-N417 


4.2 


CNS cancer (astral SF-539 


18.2 


Lung ca. LX-l 


26.2 


CNS cancer f astro} SNB-75 


■62.9 


Lungca. NCI-H146 


5.4 


CNS cancer (glio) SNB-19 


16.7 


Lung ca. SHP-77 


46.3 


CNS cancer (glio) SF-295 


34.4 


Lung ca. A549 


27.2 


Brain (Amygdala) Pool 


8.7 


Lung ca. NCI-H526 


7.7 


Brain (cerebellum) 


19.1 


Lung ca. NCI-H23 


40.3 


Brain (fetal) 


8.0 


Lung ca. NCI-H460 


20.6 


Brain (Hippocampus) Pool 


12.4 


Lung ca. HOP-62 


13.6 


Cerebral Cortex Pool 


6.1 


Lung ca. NCI-H522 


17.2 


Brain (Substantia nigra) Pool 


6.0 


Liver 


CM) 


Brain (Thalamus) Pool j 1 7.4 


Fetal Liver 


15.9 


Brain (whole) 


4.9 


Liver ca. HepG2 


14.6 


Spinal Cord Pool 


16.6 


Kidney Pool 


25.9 


Adrenal Gland 


7.4 


Fetal Kidney 


43.5 


Pituitary gland Pool 


15.8 


Renal ca. 786-0 


31.2 


Salivary Gland 


1.7 


Renal ca. A498 


18.3 


Thyroid (female) 


5.8 


Renal ca. ACHN 


10.7 


Pancreatic ca. CAPAN2 


35.1 


Renal ca. UO-31 


26.1 


Pancreas Pool 


17.0 
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AI_comprehensive panclvl.O Summary: Ag6564 Highest expression of this gene 
is detected in control psoriasis sample (CT=24). This gene shows wide expression in this 
panel with high levels of expression of in samples derived from bone, cartilage, synovium 
and synovial fluid samples from osteoarthritis, and rheumatoid arthritis patients, as well as, in 
5 samples derived from normal lung samples, COPD lung, emphysema, atopic asthma, asthma, 
allergy, Crohn's disease (normal matched control and diseased), ulcerative colitis (normal 
matched control and diseased), and psoriasis (normal matched control and diseased). 
Therefore, therapeutic modulation of this gene product may ameliorate symptoms/conditions 
associated with autoimmune and inflammatory disorders including psoriasis, allergy, asthma, 
10 inflammatory bowel disease, rheumatoid arthritis and osteoarthritis 

General_screening_panel_vl.6 Summary: Ag6564 Highest expression of this gene 
is seen in ovarian cancer SK-OV-3 cell line (CT=24.4). High levels of expression of this gene 
is also seen in cluster of cancer cell lines derived from pancreatic, gastric, colon, lung, liver, 
renal, breast, ovarian, prostate, squamous cell carcinoma, melanoma and brain cancers. Thus, 
1 5 expression of this gene could be used as a marker to detect the presence of these cancers. 
Furthermore, therapeutic modulation of the expression or function of this gene may be 
effective in the treatment of pancreatic, gastric, colon, lung, liver, renal, breast, ovarian, 
prostate, squamous cell carcinoma, melanoma and brain cancers. 

Among tissues with metabolic or endocrine function, this gene is expressed at high 
20 levels in pancreas, adipose, adrenal gland, thyroid, pituitary gland, skeletal muscle, heart, 
fetal liver and the gastrointestinal tract. Therefore, therapeutic modulation of the activity of 
this gene may prove useful in the treatment of endocrine/metabolically related diseases, such 
as obesity and diabetes. 

Interestingly, this gene is expressed at much higher levels in fetal (CT=27) when 
25 compared to adult liver (CT=40). This observation suggests that expression of this gene can 
be used to distinguish fetal from adult liver. In addition, the relative overexpression of this 
gene in fetal tissue suggests that the protein product may enhance liver growth or 
development in the fetus and thus may also act in a regenerative capacity in the adult. 
Therefore, therapeutic modulation of the protein encoded by this gene could be useful in 
30 treatment of liver related diseases. 

In addition, this gene is expressed at high levels in all regions of the central nervous 
system examined, including amygdala, hippocampus, substantia nigra, thalamus, cerebellum, 
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cerebral cortex, and spinal cord. Therefore, therapeutic modulation of this gene product may 
be useful in the treatment of central nervous system disorders such as Alzheimer's disease, ' 
Parkinson's disease, epilepsy, multiple sclerosis, schizophrenia and depression. 
E. NOV20a: XIN 

5 Expression of gene NOV20a was assessed using the primer-probe set Ag3459, 

described in Table EA. Results of the RTQ-PCR runs are shown in Tables EB, EC and ED. 
Table EA. Probe Name Ag3459 



Primers 


Sequences 


Length 


Start Position 


jSEQ ID 
]No 


Forward 


5'-acacaactggctcaggacatag-3' 


22 


4705 


|357 


Probe 


TET-5'-ctgctccaccagaaaggtgtccaag-3'-TAMRA 


25 


4735 


1358 


Reverse 


5'-gtgatglccttcttcccagttt-3' 


22 


4763 


|359 



10 Tabic EB. General screening panel vl.4 



Tissue Name 


Rcl. 

Exp.(%) 
Ag3459, 
Run 

217066335 


Tissue Name 


Rel. 

Exp.(%) 
|Ag3459, 
[Run 

1217066335 


Adipose 1 1.8 


Renal ca. TK-10 


jo.o 


[Melanoma* Hs688(A).T [o.O™ 


Bladder 


E6 ™ 


Melanoma* Hs688(B)T 


|0.0 


Gastric ca. (liver met.) NCI-N87 


jo.o 


Melanoma* MI4 


0.0 


Gastric ca. KATO III 


"o.o 


Melanoma* LOXIMVI 


6.0 ~ 


Colon ca. SW-948 jO.O 


Melanoma* SK-MEL-5 


0.0 


Colon ca. SW480 


:0.0 


Squamous cell carcinoma SCC-4 


0.0 


Colon ca.* (SW480 met) SW620 


0.0 


Testis Pool 


0.0 


Colon ca. HT29 


0.0 


Prostate ca.* (bone met) PC-3 lO.O 


Colon ca. HCT-116 


0.0 


Prostate Pool jo.3 


Colon ca. CaCo-2 


0.0 


Placenta 


0.0 


Colon cancer tissue 




Uterus Pool 


0.0 


Colon ca. SW1116 


0.0 


Ovarian ca. OVCAR-3 


0.0 


Colon ca. Colo-205 


0.0 


Ovarian ca. SK-OV-3 


o.o 


Colon ca. SW-48 


0.0 


Ovarian ca. OVCAR-4 


0.0 


Colon Pool 


0.0 


Ovarian ca. OVCAR-5 jo.O 


Small Intestine Pool 


0.0 


Ovarian ca. IGROV-1 jo.O 


Stomach Pool 


0.0 


Ovarian ca. OVCAR-8 


0.0 


Bone Marrow Pool 


0.0 


Ovary 


0.0 


Fetal Heart 


19.9 


Breast ca. MCF-7 


0.0 


Heart Pool 


4.2 ; 


Breast ca. MDA-MB-231 


0.0 


Lymph Node Pool 


0.0 



471 



WO 03/023002 




PCT/US02/28539 



jBreast ca. BT 549 


0.0 


jFetal Skeletal Muscle 


12.4 


;Breast ca. T47D 


0.0 


'Skeletal Muscle Pool 


100.0 


IBreast ca. MDA-N 


0.0 


(Spleen Pool 


[0.0 


■Breast Pool 


0.0 


(Thymus Pool 


jo.o 


jTrachea 


0.0 


jCNS cancer (glio/astro) U87-MG 


0.0 


jLung 


0.0 


CNS cancer (glio/astro) U-l 1 8-MG jO.O 


Fetal Lung 


0.2 jCNS cancer (neuro;met) SK-N-AS 


0.0 


Lungca. NCI-N417 


0.0 


|CNS cancer (astro) SF-539 


0.0 


Lungca. LX-1 


0.0 


|CNS cancer (astro) SNB-75 


0.0 


Lungca. NCI-H 146 


0.0 


jCNS cancer (glio) SNB-19 


0.0 


lLungca.SHP-77 


0.0 


|CNS cancer (glio) SF-295 


0.0 


[Lung ca. A549 


0.0 


[Brain (Amygdala) Pool 


0.0 


jLung ca.NCI-H526 


0.0 


Brain (cerebellum) 


0.0 


jLung ca. NCI-H23 


0.0 


[Brain (fetal) 


0.0 


jLung ca.NCI-H460 ! 


0.0 


Brain (Hippocampus) Pool 


0.0 


jLung ca. HOP-62 ! 


0.0 


Cerebral Cortex Pool 


0.0 


iLungca.NCI-H522 


0.0 


Brain (Substantia nigra) Pool ;0.0 


(Liver j 


0.0 


Brain (Thalamus) Pool 


0.0 J 


| Fetal Liver j 


0.0 


Brain (whole) 


0.0 


(Liver ca. HepG2 j 


0.0 iSpinal Cord Pool 


0.0 


Kidney Pool j 


0.0 


Adrenal Gland 


0.0 


Fetal Kidney j 


0.0 


Pituitary gland Pool 


0.0 


Renal ca. 786-0 j 


0.0 


Salivary Gland 


0.1 


Renal ca. A498 j 


0.0 


Thyroid (female) 


0.0 


Renal ca. ACHN j 


0.0 


Pancreatic ca. CAPAN2 


0.0 


Renal ca. UO-31 |0.0 jPancreas Pool 


0.0 



Table EC. Panel 4D 



Tissue Name 


RcL 

Exp.(%) 
Ag3459, 
Run 

166417097 


Tissue Name 


Rel. 

Exp.(%) 
Ag3459, 
Run 

166417097 


Secondary Thl act 


0.2 


HUVEC IL-lbeta 


0.0 


Secondary Th2 act 


7.9 


HUVEC IFN gamma 


0.0 


Secondary Trl act 


7.7 s 


HUVEC TNF alpha + IFN gamma 


0.0 


Secondary Thl rest 


0.0 


HUVEC TNF alpha + IL4 


0.0 


Secondary Th2 rest 


0.0 


HUVEC IL- 11 


0.0 


Secondary Trl rest 


0.0 


Lung Microvascular EC none 


0.0 


Primary Th 1 act 


0.1 


Lung Microvascular EC TNFalpha + IL- ! 
I beta 


0.0 


Primary Th2 act 


2.2 


Microvascular Dermal EC none 


0.0 
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Pri marv TV 1 art 

I 1 1 1 1 iai J II 1 ALL 


0.6 


Microsvasular Dermal EC TNFalpha + 
IL-lbeta 


0 0 

v.v 


Primary Thl rest 


0.1 


Bronchial epithelium TNFalpha + 
I L 1 beta 


0.0 


Primary Th2 rest 


0.0 


Small airway epithelium none 


0.0 


Primary Trl rest 


0.0 


Small airway epithelium TNFalpha + IL- 
Ibeta 


0.0 


CD45RA CD4 lymphocyte act 


0.2 


Coronery artery SMC rest 


0.0 


CD45RO CD4 lymphocyte act 


0.5 


Coronery artery SMC TNFalpha + IL- 
lbeta 


0.0 


CD8 lymphocyte act 


0.6 


Astrocytes rest 


{ 0.0 


Secondary CD8 lymphocyte rest 


1.1 


Astrocytes TNFalpha + IL-1 beta 


2.4 


Secondary CD8 lymphocyte act 


7.3 


KU-812 (Basophil) rest 


0.0 


CD4 lymphocyte none 


0.0 


KU-812 (Basophil) PMA/ionomycm 
CCD1 106 (Keratinocytes) none 


63.7 


2ry Thl/Th2/Trl_anti-CD95 


0.0 


0.0 


LAK cells rest 


2.3 


\^\~>ui i \jo ^Pveraiinocyieiy i iNraipna < 
IL-lbeta 


0.3 


LAK cells IL-2 


0.1 


Liver cirrhosis 


1.1 


LAK cells IL-2+IL- 12 


0.7 


Lupus kidney 


0.0 


LAK cells IL-2+IFN gamma 


2.2 


NCI-H292 none 


0.0 


LAKcellsIL-2+IL-I8 


1.2 


NCI-H292 IL-4 


0.0 


LAK cells PMA/ionomycin 


100.0 


NCI-H292 IL-9 


0.0 


NK Cells IL-2 rest 


0.1 


NCI-H292IL-13 


0.0 


Two Way MLR 3 day 


1.5 


NCI-H292 I FN gamma 


0.0 


Two Way MLR 5 day 


7.9 


H PA EC none 


0.0 


Two Way MLR 7 day 


3.8 . j 


HPAEC TNF alpha + IL-1 beta 


0.1 


PBMC rest 


0.0 


Lung fibroblast none 


0.3 


IPRMP PWM 

ilDlvl^ 1 W 1VI 


0 7 


1 lino fihrnKlact 1 MP alnhn -4- II 1 K*»tn 
LjLII lid, 1 ILM UUIdol J INF dipild ' 1 L<- 1 Utld 




PRMf PHA-1 




I lino fihrnhlaQT II -A 


1 .o 


Rainn<; fR rell^ none 


0.0 


I lint* fibroblast II -Q 


V.J 


RamnQ fR rein ionnmvr in 

ivuu I WD \*** •'^•11/ ivliuiliyvlll 

i ' ' 


0.0 


1 iino flhrohlast II - 1 1 


1 0 

I .V 


'R Ivmnhnrvtes PWM 


1.7 


1 lino fihrnhlast IFM oammfl 




j R lvmnhnrvtes CD40I and II -4 


0 2 

VS. 4. 


Dermal fihrnhlast PPO 1070 rest 


o i 


FOI -1 dhcAMP 


0 0 


Dermal fihrr>hla<;t CPD1070 TNF alnlia 

Lvt 1 Midi 1 1 LFI UUI do I L/ IV//U 1 IN 1 Cll Ul lu 


v.O 


EOL-i dbcAMP 
PMA/ionomycin 


32.5 


Dermal fibroblast CCD1070 IL-1 beta 


0.2 


Dendritic cells none 


0.5 


Dermal fibroblast IFN gamma 


0.2 


Dendritic cells LPS 


1.2 


Dermal fibroblast IL-4 


0.0 


Dendritic cells anti-CD40 


0.1 


IBD Colitis 2 


0.4 


Monocytes rest 


0.0 


IBD Crohn's 


0.3 


Monocytes LPS 


0.0 


Colon 


2.1 


Macrophages rest 


0.2 


Lung 


0.6 


Macrophages LPS 


8.8 


Thymus 


0.0 
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|HUVEC none 


0.0 


Kidney |0.3 


jHUVEC starved 


0.0 


i 



Table ED. general oncology screening panel v 2,4 



Tissue Name 


]ReL 
:Exp.(%) 
jA g 3459, 
iRun 

1267242232 


Tissue Name 


Rel. 

|Exp.(%) 
; Ag3459, 
Run 

1267242232 


Colon cancer 1 


15.9 


Bladder cancer NAT 2 


0.0 


Colon cancer NAT 1 


jo.o 


Bladder cancer NAT 3 ;2. 1 


Colon cancer 2 


■ I 5 - 3 


Bladder cancer NAT 4 ]0.0 


Colon cancer NAT 2 


p 


Prostate adenocarcinoma 1 ;7.5 


Colon cancer 3 


j8.0 


Prostate adenocarcinoma 2 ;0.0 


Colon cancer NAT 3 


iO.O 


Prostate adenocarcinoma 3 


lO.O 


Colon malignant cancer 4 


124.1 


Prostate adenocarcinoma 4 


; 

7.4 


Colon normal adjacent tissue 4 


12.4 


Prostate cancer NAT 5 


; 100.0 


Lung cancer 1 


]60.3 


Prostate adenocarcinoma 6 


15.1 


Lung NAT 1 


11.4 


Prostate adenocarcinoma 7 


|0.0 


Lung cancer 2 


171.7 


Prostate adenocarcinoma 8 


:o.o 


Lung NAT 2 jl.5 


Prostate adenocarcinoma 9 jO.O 


Squamous cell carcinoma 3 


]4.4 


Prostate cancer NAT 1 0 


lo.o 


Lung NAT 3 


jO.O 


Kidney cancer 1 


0.0 


metastatic melanoma 1 


]6.0 


KidneyNAT 1 


|1.5 


Melanoma 2 


8.7 


Kidney cancer 2 j9. 1 


Melanoma 3 


lo.o 


Kidney NAT 2 ;6.0 


metastatic melanoma 4 


~0.9 


Kidney cancer 3 -62 


metastatic melanoma 5 


110.4 


Kidney NAT 3 


jo.o 


Bladder cancer 1 


)4.4 


Kidney cancer 4 


?5.4 


Bladder cancer NAT 1 


jo.o 


Kidney NAT 4 




Bladder cancer 2 


jo.o 




i 



5 General_screening_panel_vl.4 Summary: Ag3459 Highest expression of this gene 

is detected in skeletal muscle (CT=23.7). Interestingly, expression of this gene is higher in 
adult as compared to fetal skeletal muscle (CT=26.7). Therefore, expression of this gene may 
be used to distinguish between the adult and fetal skeletal muscle. 

Moderate to low levels of expression of this gene is also seen in tissues with 
10 metabolic functions including adipose, skeletal muscle, heart, and liver. Therefore, 

therapeutic modulation of the activity of this gene may prove useful in the treatment of 
metabolically related diseases, such as obesity and diabetes. 
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Panel 4D Summary: Ag3459 Highest expression of this gene is detected in 
PMA/ionomycin stimulated LAK cells (CT=24.9). Moderate to low levels of expression of 
this gene is seen in activated polarized T cells, memory and naive T cells, actived B cells, two 
way MLR, LAK, eosinophils, dendritic cells, lung and dermal fibroblasts, colon, lung, 
5 kidney, liver cirrhosis and lupus kidney. Interestingly, expression of this gene is upregulated 
in cytokine activated LAK cells, polarized T cells, PBMC, eosinophils, macrophage, 
basophils, keratinocytes and HPAEC endothelial cells. Therefore, modulation of the gene 
product with a functional therapeutic may lead to the alteration of functions associated with 
these cell types and lead to improvement of the symptoms of patients suffering from 

10 autoimmune and inflammatory diseases such as asthma, allergies, inflammatory bowel 
disease, lupus erythematosus, psoriasis, rheumatoid arthritis, and osteoarthritis. 

general oncology screening panel_v_2.4 Summary: Ag3459 Highest expression of 
this gene is detected in normal adjacent prostate sample (CT=32.7). Expression of this gene is 
higher in normal tissue as compared to prostate cancer (CTs=36-40). Therefore, expression of 

15 this gene may be used to distinguish the cancerous region from normal prostate. 

In addition, moderate to low levels of expression of this gene is also seen in malignant 
colon cancer and lung cancers. Expression of this gene is higher in cancer as compared to the 
corresponding normal adjacent tissue. Therefore, expression of this gene may be used as 
marker to detect the presence of colon and lung cancer. In addition, therapeutic modulation of 

20 this gene may be useful in the treatment of these cancers. 

F. NOV21a: PROSTATIC BINDING PROTEIN 

Expression of full-length physical clone NOV21a was assessed using the primer- 
probe set Ag4464, described in Table FA. Results of the RTQ-PCR runs are shown in Tables 
FB, FC and FD. 

25 Table FA. Probe Name Ag4464 



Primers 


Sequences 


Length 


Start Position 


SEQ ID 

No 


Forward 


5'-cttgattcagggaagctctaca-3' 


22 


177 


360 


Probe 


TET-5'-ctcccagcaggaaggatcccaaat-3'-TAMRA 


24 


223 


36! 


'Reverse 


5'-aggaaatgatgccattctctgt-3 ' 


22 


247 


362 



Table FB. General screening panel vl.4 
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Tissue Name 


Rel. 

Exp.(%) 
Ag4464, 
Run 

222653242 


Tissue Name 


Rel. 

Exp.(%) 
Ag4464, 
Run 


iAdinose 


9.0 


Renal ca. TK-10 


ZD. / 


jMeianoma* Hs688(A).T 


21.0 


Bladder 


1 A O 


Melanoma* Hs688CB > > T 


18.4 


Gastric ca. (liver met.) NCI-N87 


1£ 1 
I O.j 


Melanoma* M14 


34.4 


Gastric ca. KATO III 


jj.O 


Melanoma* LOXIMVI 


29.9 


Colon ca. SW-948 


lo.o 


Melanoma* SK-MEI -S 


60.3 


Colon ca. SW480 


1A A 


Sniiamnns cell rarrinnma 


7.6 jCoIon ca.* (SW480 met) SW620 


Zj.j 


Tpctic Pool 


9.1 jColon ca. HT29 




Prostate ca * f hnnp mph Pf^-^ 


27.5 Colon ca. HCT-116 


ia o 


Prostate Pool 

1 1 VJitilw 1 \J\J I 


6.8 jColonca. CaCo-2 


I J.Z 


Placenta 


3 .8 [Colon cancer tissue 


1 Q O 


1 IteriK Pnol 


5.4 jColonca. SWI116 


lz.z 


Ovarian ca OVCAR-"} 

v»y ▼ ut i cm i veil V-' t \— r/ \ ix J 


7.3 {Colon ca. Coio-205 




Ovarian ca SK-OV-1 


31.9 


Colon ca. SW-48 


15.7 


Ovarian ca OVCAR-4 


15.4 ]ColonPool 


13.1 


Ovarian ra OVPAR-S 

Veil lull 1>U. \J V V^/A IX «J 


25.2 


Small Intestine Pool 


r 7.5 • ' 


Ovarian ca lOROV-l 

wai lull VU. 1 VJ Ivv^/ V I 


19.6 


Stomach Pool 


5.5 


Ovarian ca OVCAR-R 

w v cii lull bu. \«y v vrti\ O 


11.7 


Bone Marrow Pool 


4.7 


Ovary 


15.8 


Fetal Heart 


9.6 


Breast ca MCF-7 

L>l VUJl VC4. 1 ▼ I i / 


42.3 


Heart Pool 


11.5 


Breast ca MDA-MB-231 


30.8 


Lymph Node Pool 


11.5 


Breast ca BT 549 


27.9 


Fetal Skeletal Muscle J3.9 


Breast ca T47D 


59.5 


Skeletal Muscle Pool [l7.3 


Breast ca. MDA-N 


14.3 jSpleenPool 11.7 


Breast Pool 


10.6 (Thymus Pool 6.5 


Trachea 


1 2.9 jCNS cancer (glio/astro) U87-MG ]28. 1 


Lune 


3.9 jCNS cancer (glio/astro) U-J 1 8-MG 


28.9 


Fetal Lung 


12.5 jCNS cancer (neuro;met) SK-N-AS 


26.1 


Lungca.NCl-N417 


1 3.3 [CNS cancer (astro) SF-539 


8.5 


Lune ca LX-1 


1 1 .0 jCNS cancer (astro) SNB-75 


32.1 


lung ca. NCI-HI 46 


14.9 jCNS cancer (glio)SNB- 19 


21.0 


|Eungca.SHP-77 


35.6 jCNS cancer (glio) SF-295 121.0 


Lung ca. A549 


24.7 (Brain (Amygdala) Pool 


31.9 


Lungca. NCI-H526 


1 8.6 jBrain (cerebellum) 


74.2 


lungca. NCI-H23 


15.8 {Brain (fetal) 


23.2 


Lung ca. NC1-H460 


7.7 jBrain (Hippocampus) Pool 


36.3 


Lung ca. HOP-62 


7. 1 (Cerebral Cortex Pool 


58.6 


[Cling ca. NCI-H522 


1 7.4 JBrain (Substantia nigra) Pool 


44.1 


Liver 


1 4.4 jBrain (Thalamus) Pool 


54.7 
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jFetal Liver 


48.6 


jBrain (whole) 


30.8 


Liver ca. HepG2 


39.8 


jSpinal Cord Pool 


47.6 


Kidney Pool 


16.7 


{Adrenal Gland 


100.0 


Fetal Kidney 


8.6 


{Pituitary gland Pool 


7.4 


Renal ca. 786-0 


26.8 


|Salivary Gland 


9.6 


Renal ca. A498 


5.0 


(Thyroid (female) 


31.0 


Renal ca. ACHN 


14.7 


[Pancreatic ca. CAPAN2 


6.7 


Renal ca. UO-3 1 


23.3 


jPancreas Pool 


14.7 | 



Table FC. Panel CNS 1 



Tissue Name 


Rel. 

Exp.(%) 
Ag4464, 
Run 

191785915 


Tissue Name 


Rel. 

Exp.(%) 
Ag4464, 
Run 

191785915 


JBA4 Control 


48.6 


BA17PSP 


;38.2 


1BA4 Contro!2 


49.7 


BA17PSP2 


16.0 


iBA4 Alzheimer's2 


7.9 


Sub Nigra Control 


57.4 


]BA4 Parkinson's 


54.0 


Sub Nigra Control2 


37.1 


IBA4 Parkinson's2 


95.9 


Sub Nigra Alzheimer's2 


27.5 


BA4 Huntington's 


45.4 


Sub Nigra Parkinson's2 


93.3 


BA4 Huntington's2 


13.7 


Sub Nigra Huntington's 


94.6 


BA4 PSP 


18.0 


Sub Nigra Huntington's2 


51.1 


BA4 PSP2 


50.0 


Sub Nigra PSP2 


13.4 


BA4 Depression 


20.7 


Sub Nigra Depression 


12.2 


BA4 Depression2 


10.4 


Sub Nigra Depression2 |l4.5 


BA7 Control 


66.0 


Glob Palladus Control 


16.2 


BA7 Control2 


45.1 


Glob Palladus Control2 


16.5 


BA7 Alzheimer's2 


12.0 


Glob Palladus Alzheimer's 


29.9 


BA7 Parkinson's 


22.1 


Glob Palladus Alzheimer's2 


12.6 


jBA7 Parkinson's2 


47.0 


Glob Palladus Parkinson's 


84.7 


BA7 Huntington's 


60.7 


Glob Palladus Parkinson's2 


27.0 


BA7 Huntington's2 


39.2 


Glob Palladus PSP 


8*4 


BA7 PSP 


56.6 


Glob Palladus PSP2 


14.6 


BA7 PSP2 


55.9 


Glob Palladus Depression 


11.3 


BA7 Depression 


15.1 


Temp Pole Control 


16.7 


BA9 Control 


47.3 


Temp Pole Control2 


72.7 


BA9 Control 


85.9 


Temp Pole Alzheimer's 


12.0 


BA9 Alzheimer's 


12.9 


Temp Pole Alzheimer's2 


8.4 


BA9 Alzheimer's2 


27.0 


Temp Pole Parkinson's 


39.0 


BA9 Parkinson's 


33.9 


Temp Pole Parkinson's2 


47.0 


BA9 Parkinson's2 


66.0 


Temp Pole Huntington's 


57.4 


BA9 Huntington's 


76.8 


Temp Pole PSP 


6.9 
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BA9 Huntington^ 


28.1 


Temp Pole PSP2 


1 1 2.3 


BA9 PSP 


25.2 


Temp Pole Depression2 


8Ts 


BA9 PSP2 


8.0 


Cing Gyr Control 


66.9 


BA9 Depression 


14.2 


Cing Gyr Control2 


140.3 


B A9 Depression2 


11.8 


Cing Gyr Alzheimer's 


144.4 


BA 17 Control 


48.6 


Cing Gyr Alzheimer's2 


14.4 


BA17Control2 


57.8 


Cing Gyr Parkinson's 


33.0 


BA17 Alzheimer's2 


8.2 


Cing Gyr Parkinson's2 


52.5 


BA17 Parkinson's 


38.7 


Cing Gyr Huntington's 


100.0 


BA17 Parkinson's2 


49.3 


Cing Gyr Huntington's2 


33.7 


BA 17 Huntington's 


40.1 


Cing Gyr PSP 


36.6 


BA17 Huntington's2 


15.6 


Cing Gyr PSP2 


15.7 


BA 17 Depression 


13.3 j 


Cing Gyr Depression 


10.2 


BAI7 Depression2 


33.7 


Cing Gyr Depression2 j 1 7.6 



Table FD. Panel CNS 1.1 



Tissue Name 


Rel. 

Exp.(%) 
Ag4464, 
Run 

195308648 


Tissue Name 


Rel. 3 
Exp.(%) 
Ag4464, 
Run 

195308648 


Cing Gyr Depression2 |22.8 


BA17PSP2 


24.8 


Cing Gyr Depression ll 2.9 


BA17PSP 


46.7 


Cing Gyr PSP2 


14.6 


BA17 Huntington's2 


14.5 


Cing Gyr PSP 


33.9 


BA17 Huntington's 


45.4 


Cing Gyr Huntington's2 


30.6 


BA17 Parkinson's2 ]49.3 


Cing Gyr Huntington's 


97.9 jBA 17 Parkinson's 33.2 


Cing Gyr Parkinson's2 


66.9 


BA17 AIzheimer's2 j9.0 


jCing Gyr Parkinson's 


47.0 


BA17Control2 [54.7 


jCing Gyr Alzheimer's2 


15.6 


BA17 Control j46.3 


Cing Gyr Alzheimer's 


47.6 


BA9 Depression2 114.5 


Cing Gyr Control 


43.2 

73.2*""""""" 


BA9 Depression 


14.0 


Cing Gyr Control 


BA9 PSP2 


10.8 


Temp Pole Depression2 


11.0 


BA9 PSP 


7.6 


■Temp Pole PSP2 


14.1 


BA9 Huntington's2 


24.3 


Temp Pole PSP 


8.1 


BA9 Huntington's 


88.3 


Temp Pole Huntington's 


46.0 


BA9 Parkinson's2 


77.4 


Temp Pole Parkinson's2 


48.0 


BA9 Parkinson's 


45.1 


•Temp Pole Parkinson's 


36.6 


BA9 Alzheimer's2 


24.0 


Temp Pole Alzheimer's2 


14.4 


BA9 Alzheimer's jl4.2 


Temp Pole Alzheimer's 


14.6 


BA9Control2 ;94.0 


Temp Pole Control2 


70.7 


BA9 Control 


48.0 


Temp Pole Control 


17.9 


BA7 Depression 


12.2 
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Glob Palladus Depression 


11.0 


BA7 PSP2 


62.4 


,'Glob Palladus PSP2 


15.8 


BA7 PSP 


59.0 


{Glob Palladus PSP 


10.4 


BA7 Huntington's2 


42.9 


•Glob Palladus Parkinson's2 


28.9 


BA7 Huntington's 


55.1 


jGlob Palladus Parkinson's 


88.9 


BA7 Parkinson's2 


33.7 


Glob Palladus Alzheimer's2 


13.6 


BA7 Parkinson's 


23.8 


Glob Palladus Alzheimer's 


37.4 


BA7 Alzheimer's2 


9.8 


Glob Palladus Control2 


13.3 


BA7 Control2 


42.9 


Glob Palladus Control 


17.0 


BA7 Control 


59.0 


Sub Nigra Depression2 


18.2 


BA4 Depression2 


12.7 


'Sub Nigra Depression 


14.6 


BA4 Depression 


26.8 


jSub Nigra PSP2 


14.5 


BA4 PSP2 


53.2 


jSub Nigra Huntington's2 


50.3 


BA4 PSP 


22.4 


jSub Nigra Huntington's 


89.5 


BA4 Huntington's2 


15.2 


jSub Nigra Parkinson's2 


95.3 


BA4 Huntington's 


51.8 


jSub Nigra Alzheimer's2 


25.7 


BA4 Parkinson's2 


100.0 


|Sub Nigra Control2 


39.0 


BA4 Parkinson's 


58.6 


Sub Nigra Control 


46.7 


BA4 Alzheimer's2 


11.2 


BAI7 Depression2 


33.0 


BA4 Control 


62.9 


B A 1 7 Depression 

» 1 — 2 


12.9 


BA4 Control 


49.7 



General_screeningj)anel_vl.4 Summary: Ag4464 Highest expression of this gene 
is seen in adrenal gland (CT=23.4). In addition, this gene is also expressed at high levels in 
pancreas, adipose, thyroid, pituitary gland, skeletal muscle, heart, liver and the 
5 gastrointestinal tract. Therefore, therapeutic modulation of the activity of this gene may prove 
useful in the treatment of endocrine/metabolically related diseases, such as obesity and 
diabetes. 

In addition, this gene is expressed at high levels in all regions of the central nervous 
system examined, including amygdala, hippocampus, substantia nigra, thalamus, cerebellum, 
1 0 cerebral cortex, and spinal cord. Therefore, therapeutic modulation of this gene product may 
be useful in the treatment of central nervous system disorders such as Alzheimer's disease, 
Parkinson's disease, epilepsy, multiple sclerosis, schizophrenia and depression. 

High levels of expression of this gene is also seen in cluster of cancer cell lines 
derived from pancreatic, gastric, colon, lung, liver, renal, breast, ovarian, prostate, squamous 
1 5 cell carcinoma, melanoma and brain cancers. Thus, expression of this gene could be used as a 
marker to detect the presence of these cancers. Furthermore, therapeutic modulation of the 
expression or function of this gene may be effective in the treatment of pancreatic, gastric, 
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colon, lung, liver, renal, breast, ovarian, prostate, squamous cell carcinoma, melanoma and 
brain cancers. 

Panel CNS_1 Summary: Ag4464 This panel confirms the expression of this gene at 
significant levels in the brains of an independent group of individuals. Please see Panel 1 .4 
5 for a discussion of the potential use of this gene in treatment of central nervous system 
disorders. 

Panel CNS_1.1 Summary: Ag4464 This panel confirms the expression of this gene 
at significant levels in the brains of an independent group of individuals. Please see Panel 1.4 
for adiscussion of the potential use of this gene in treatment of central nervous system 
10 disorders. 

G. NOV32b: EH DOMAIN-BINDING MITOTIC PHOSPHOPROTEIN 

Expression of gene NOV32b was assessed using the primer-probe set Ag3088, 

described in Table GA. Results of the RTQ-PCR runs are shown in Tables GB, GC and GD. 
Table GA. Probe Name Ag3088 



Primers 


Sequences 


Length jStart Position ^ 


Forward 


5 , -cacgtttacaaggccatgac-3' 


20 ] 1096 |363 


Probe 


TET-S'-atggagtacctcatcaagaccggctc^'-TAMRA 


26 j 1 1 20 {364 


Reverse 


5'-atgttctccttgcactgctg-3' 


20 1 1 59 |365 



Table GB.CNS neurodegeneration vl.O 



Tissue Name 


Rel. 

Exp.(%) 
Ag3088, 
Run 

208974163 


Tissue Name 


Rel. 

Exp.(%) 
Ag3088, 
Run 

208974163 


AD 1 Hippo 


19.8 


Control (Path) 3 Temporal Ctx 


14.0 


AD 2 Hippo 


35.6 


Control (Path) 4 Temporal Ctx 


44.4 


AD 3 Hippo 


17.9 


AD 1 Occipital Ctx 


27.9 


AD 4 Hippo 


17.4 


AD 2 Occipital Ctx (Missing) 


0.0 


AD 5 Hippo 


100.0 


AD 3 Occipital Ctx 


Tit ' " ~ 


AD 6 Hippo 


62.0 


AD 4 Occipital Ctx 


92.7 


Control 2 Hippo 


58.2 _ 


AD 5 Occipital Ctx 


71.2 


Control 4 Hippo 


1 5.3 


AD 6 Occipital Ctx 


26.1 


Control (Path) 3 Hippo 


13.5 i 


Control 1 Occipital Ctx 


8.7 


AD 1 Temporal Ctx 


29.1 ! 


Control 2 Occipital Ctx 


77.9 


AD 2 Temporal Ctx 


47.6 


Control 3 Occipital Ctx 


29.7 


AD 3 Temporal Ctx 


14.8 


Control 4 Occipital Ctx 


9.8 
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AD 4 Temporal Ctx 


34.2 


Control (Path) 1 Occipital Ctx j69.3 


AD 5 Inf Temporal Ctx 


84.7 


Control (Path) 2 Occipital Ctx ]16.8 


AD 5 Sup Temporal Ctx 


47.6 


Control (Path) 3 Occipital Ctx 7. 1 


AD 6 Inf Temporal Ctx 


65.5 


Control (Path) 4 Occipital Ctx 24.7 


AD 6 Sup Temporal Ctx 


60.3 


Control I Parietal Ctx il3.9 


Control 1 Temporal Ctx 


10.0 


Control 2 Parietal Ctx ]66.9 


Control 2 Temporal Ctx 


69.3 


Control 3 Parietal Ctx j 1 9.9 


Control 3 Temporal Ctx 


33.9 


Control (Path) 1 Parietal Ctx J63.7 


Control 3 Temporal Ctx 


17.9 


Control (Path) 2 Parietal Ctx 


33.2 


Control (Path) 1 Temporal Ctx 


67.8 


Control (Path) 3 Parietal Ctx {8.7 


Control (Path) 2 Temporal Ctx 


52.5 


Control (Path) 4 Parietal Ctx [59.5 



Table GC. Panel 1.3D 



Tissue Name 


Rel. 

Exp.(%) 
Ag3088, 

Dim 

Kun 

165552924 


Tissue Name 


Rel. 

Exp.(%) 
Ag3088, 
Kun 

165552924 


Liver adenocarcinoma 


37.1 


Kidney (fetal) 


11.7 


Pancreas 


18.9 


Renal ca. 786-0 


11.6 


Pancreatic ca. CAPAN 2 


31.2 


Renal ca. A498 


64.2 


Adrenal gland 


15.5 


Renal ca. RXF 393 


{44.1 


Thyroid 


15.5 


Renal ca. ACHN 


25.7 


Salivary gland 


11.7 


Renal ca. UO-3 1 


36.6 


jPituitary gland 


10.3 


Renal ca. TK-10 


9.9 


Brain (fetal) 


46.7 


Liver 


12.1 


Brain (whole) 


70.2 


Liver (fetal) 


21.9 


jBrain (amygdala) 


64.6 


Liver ca. (hepatoblast) HepG2 


52.1 


Brain (cerebellum) 


53.2 


Lung 


17.2 


Brain (hippocampus) 


77.4 


Lung (fetal) 


18.6 


Brain (substantia nigra) 


29.3 


Lung ca. (small cell) LX-1 


12.4 


Brain (thalamus) 


55.5 


Lung ca. (small cell) NCI-H69 


19.1 


Cerebral Cortex 


84.7 


Lung ca. (s.cell var.) SHP-77 


24.8 


Spinal cord 


19.2 


Lung ca. (large cell)NCI-H460 


18.0 


glio/astro U87-MG 


43.5 


Lung ca. (non-sm. cell) A549 


24.0 


glio/astro U-l 18-MG 


100.0 


Lung ca. (non-s.cell) NCI-H23 


6.0 


astrocytoma SW1783 


38.7 


Lung ca. (non-s.cell) HOP-62 


11.6 


neuro*; met SK-N-AS 


66.0 


Lung ca. (non-s.cl) NCI-H522 


6.2 


astrocytoma SF-539 


21.6 


Lung ca. (squam.) SW 900 


17.9 


astrocytoma SNB-75 


65.1 


Lung ca. (squam.) NCI-H596 


31.9 


glioma SNB-19 


40.3 


Mammary gland 


16.7 


glioma U251 


40.6 


Breast ca.* (pl.ef) MCF-7 


19.6 


glioma SF-295 


32.1 


Breast ca.* (pl.ef) MDA-MB-23 1 


80.1 
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IHeart ffetah 


36.9 


iRrpast ra * (n\ pf\ T47l~) 


I 1 1 .0 


(Heart 


21 9 




i4d K 

p± .... 


takplptal mnsrlp ffptaH 


14 4 


Rrpa<;t ra MDA.M 


j 1 z.o 


^Irplptal mn^rlp 


84 7 


Ovarv 




Rone marrow 


12.1 


Ovarian ca OVPAR-1 


MQ 1 


Thvmi 
j ti y i niio 


6.7 


Ovarian ra OVPAR-4 
v cii tan vo. \j vv^n r\ *t 


;OJJ 


xnlppn 




Ovarian ra OVPAR-^ 
V7 V al lull La. UVLn^'J 


To i n 
Z J .u 


1 vmnh noHp 


18.7 


Ovarian ra OVPAR-8 

v/vai i til i cel. v-/ v cai\ o 




Colorprfal 


7.7 


Ovarian ra inROV-1 

v_/ vail ail wo. ivjixvyv - ! 


S 7 


^tomarh 


58 6 


Ovarian ra * fa«;ritp^ ^K-OV-^ 


40. 0 


^mall infp^tinp 


44.4 


Uterus 


10 Q 


Polon ra ^W4Rfi 

V^UIUI 1 La. ij WHOv 


1 R Q 


PI a fpn tci 


Q C 


Colon ra * S W620YSW4R0 mpf* 




rlUMaLC 


1 0.0 


Colon ra HT7Q 




Prnctatp ra * fhnnp mp^Pr. 1 ? 


?R7 7 


Colon ca. HCT-116 


19.1 


Testis 


23.8 


Colon ca. CaCo-2 


21.9 


•Melanoma Hs688(A).T 


|I5.0 


IColon ca. tissue(OD03866) 


16.3 


Melanoma* (met) Hs688(B).T 


112.4 


(Colon ca.HCC-2998 


9.6 


Melanoma UACC-62 


131.6 


{Gastric ca * (liver met) NCI-N87 


41.2 


Melanoma M 14 


[36.6 


jBladder 


22.4 


Melanoma LOX IMV1 


24.1 


jTrachea 


21.9 


Melanoma* (met) SK-MEL-5 


15.5 


jKidney 


24.0 


Adipose 


:g.s 


Table GD. Pane! 2.2 


Tissue Name 


Rel. 

Exp.(%) 
Ag3088, 
Run 

174268937 


Tissue Name 


Rel 

Exp.(%) 
Ag3088, 
Run 

174268937 


Normal Colon 


26.6 


Kidney Margin (OD04348) 


94.6 


Colon cancer (OD06064) 


14.9 


Kidney malignant cancer 
(OD06204B) 


12.2 


Colon Margin (OD06064) 


14.7 


Kidney normal adjacent tissue 
(OD06204E) 


29.1 


Colon cancer (OD06 159) 


16.3 


Kidney Cancer (OD04450-01 ) 


42.9 


Colon Margin (OD06159) 


25.3 


Kidney Margin (OD04450-03) 


40.6 


Colon cancer (OD06297-04) 


11.5 


Kidney Cancer 8120613 


5.7 


Colon Margin (OD06297-05) 


15.8 


Kidney Margin 8120614 


52.1 


CC Gr.2 ascend colon (OD0392 1 ) < 


).0 


Kidney Cancer 9010320 


15.5 


CC Margin (OD0392I) 


10.7 


Kidney Margin 9010321 


22.4 


Colon cancer metastasis (OD06 1 04) . 


5.8 1 


Kidney Cancer 8 120607 182.9 


Lung Margin (OD06 104) 


17.4 1 


Kidney Margin 8120608 135.4 


Colon mets to lung (OD0445 1-01) : 


>3.2 1 


formal Uterus 

. — — I,, a. 


13.0 
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•Lune Marein (OD04451-02 , > 

l*J till t—y 1 ▼ 1 Ul C__ 1 1 1 \ \J ™ T ^ | %_/ X« I 


19.1 (uterine Cancer 06401 1 


19 1 

1 L.J 


'Normal Prostate 

jl 'VI IIIUI J I vOtUlv 


22.4 ]Normal Thyroid 


7 S 


iProstate Cancer (OD044 1 0) 


7.8 jThyroid Cancer 0640 1 0 


12.9 


jriostate Margin (OD04410) 


7.5 


Thyroid Cancer A302I52 


27.9 


[Normal Ovary 


48.3 


Thyroid Margin A302 153 


7.1 


jUvanan cancer (OU06283-03) 


10.4 


Normal Breast 


15.0 


jOvarian Margin (OD06283-07) 


7.6 


Breast Cancer (OD04566) 


14.5 


i : — '"• ' f " , ' n ' 

jOvarian Cancer 064008 


11.9 


Breast Cancer 1024 


30.6 


•Ovarian cancer (OD06145) 


11.7 


Breast Cancer (OD04590-01) 


60.3 


Ovarian Margin (OD06145) 


19.8 


Breast Cancer Mets (OD04590-03) 


25.2 


Ovarian cancer (OD06455-03) 




Breast Cancer Metastasis 
(OD04655-05) 


55.1 


Ovarian Margin (OD06455-07) 


1.9 


Breast Cancer 064006 


20.3 


Normal Lung 


13.2 


Breast Cancer 9100266 


16.4 


Invasive poor diff. lung adeno 
i(OLKJ4945-0l 


22.1 


Breast Margin 9100265 


7.2 


Lung Margin (ODO4945-03) 


13.5 


Breast Cancer A209073 


7.7 


Lung Malignant Cancer (OD03126) 


12.2 


Breast Margin A2090734 


17.8 


Lung Margin (OD03 1 26) 


5.6 


Breast cancer (OD06083) 


29.5 


Lung Cancer (OD050 14A) 


15.3 


Breast cancer node metastasis 
(OD06083) 


30.1 


I line Maruin (OD0S014R^ 


19.8 


Normal Liver 




[ nno cancer fODOfiORH 


21.2 


Liver Cancer 1026 


£.1.1 


{Lung Margin (OD06081) 


12.8 


Liver Cancer 1025 


80.1 


jLung Cancer (OD04237-01) 


5.5 


Liver Cancer 6004-T 


51.1 


Lung Margin (OD04237-02) 


23.5 


Liver Tissue 6004-N 


6.9 


Ocular Melanoma Metastasis 


16.4 jLiver Cancer 6005-T 


54.3 


Ocular Melanoma Margin (Liver) 


19.2 Liver Tissue 6005-N 


100.0 


Melanoma Metastasis 


21.3 


Liver Cancer 064003 


62.4 


Melanoma Margin (Lung) 


6.9 


Normal Bladder 


19.8 


Normal Kidney 


12.8 


Bladder Cancer 1023 


10.0 


Kidney Ca, Nuclear grade 2 
i(OD04338) 


59.5 


Bladder Cancer A 302 173 


24.3 


jKidney Margin (OD04338) 


18.0 


Normal Stomach 


97.9 


IKidney Ca Nuclear grade 1/2 
•(OD04339) 


55.9 


Gastric Cancer 9060397 


13.5 


'Kidney Margin (OD04339) 


26.4 


Stomach Margin 9060396 


41.2 


IKidney Ca, Clear cell type 
(OD04340) 


13.4 


Gastric Cancer 9060395 


25.9 


Kidney Margin (OD04340) 


28.9 


Stomach Margin 9060394 


37.4 


[kidney Ca, Nuclear grade 3 
;(OD04348) 


12.1 


Gastric Cancer 064005 


30.8 
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CNS_neurodegeneration_vl.O Summary: Ag3088 This panel confirms the 
expression of this gene at low levels in the brains of an independent group of individuals. 
However, no differential expression of this gene was detected between Alzheimer's diseased 
postmortem brains and those of non-demented controls in this experiment. Please see Panel 
5 1 .3 for a discussion of the potential utility of this gene in treatment of central nervous system 
disorders. 

Pane! 1.3D Summary: Ag3088 This gene is widely expressed in many of the 
samples in this panel, with highest expression in a brain cancer U-l 18-MG cell line (CT = 
26). This gene is also highly expressed in all the regions of the central nervous system, 
10 including the amygdala, cerebellum, hippocampus, substantia nigra, thalamus, cerebral cortex 
and spinal cord. Therefore, therapeutic modulation of this gene product may be useful in the 
treatment of central nervous system disorders such as Alzheimer's disease, Parkinson's 

disease, epilepsy, multiple sclerosis, schizophrenia and depression. This gene codes for a 

o 

homolog of epsin, which is involved in the phagocytosis of macromolecules, and interacts 
1 5 with Huntingtin-interacting protein. Therefore, this gene may play a critical role in the 

endocytosis of Huntingtin protein and the etiology of Huntington's disease. Downregulation 
of this gene or its protein product may be of therapeutic benefit in the treatment of 
Huntington's disease. 

This gene is also expressed in many tissues with metabolic function, including 
20 pancreas, adrenal, thyroid, and pituitary glands, skeletal muscle, heart, liver and the 

gastrointestinal tract. Therefore, therapeutic modulation of the activity of this gene may prove 
useful in the treatment of endocrine/metabolically related diseases, such as obesity and 
diabetes. 

This gene is highly expressed in cell lines derived from melanoma, renal, breast, 
25 brain, ovarian, lung, colon, kidney, pancreatic and prostate cancers. Expression of this gene is 
higher in cancer cell lines when compared to corresponding normal tissues. Based on this 
expression profile, the expression of this gene may be used as a marker to detect these 
cancers. Furthermore, therapeutic modulation of this gene may be useful in the treatment of 
these cancers. 

30 Panel 2.2 Summary: Ag3088 Highest expression of this gene is detected in normal 

liver tissue (CT = 27.3). In addition, the level of expression in some lung, breast, liver and 
kidney cancer tissue samples is higher than the corresponding adjacent control normal tissue. 
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The reverse appears to be true for colon, ovary and stomach tissue, where expression is 
slightly higher in normal tissue than the matched cancer tissues. Thus, based upon its profile, 
the expression of this gene may be used to distinguish between these cancers and the normal 
adjacent tissue. Please see panel 1 .3 for further discussion on the utility of this gene. 

5 H. NOV57a: GUANINE NUCLEOTIDE-BINDING PROTEIN GAMMA-7 SUBUNIT 
Expression of gene NOV57a was assessed using the primer-probe set Ag4907 5 

described in Table HA. Results of the RTQ-PCR runs are shown in Tables HB, HC and HD. 
Table HA. Probe Name Ag4907 



Primers 


Sequences 


Length 


Start Position 


SEQ ID 
No 


Forward 


5'-gccgatctgctgaagttct-3' 


19 


127 


366 


Probe 


TET-5'-aggccaagaatgaccccttccttgt-3'-TAMRA 


25 ]155 


367 


Reverse j 5 '-gcttc ttctccttgaaggagtt-3 ' 


22 |199 


368 



10 

Table HB. CNS neurodegeneration vl.O 



Tissue Name 


Rel. 

Exp.(%) 
Ag4907, 
Run 

214955039 


Tissue Name 


ReL 

Exp.(%) 
Ag4907, 
Run 

214955039 


AD 1 Hippo 


16.5 


Control (Path) 3 Temporal Ctx 


4.8 


AD 2 Hippo 


12.6 


Control (Path) 4 Temporal Ctx 28.5 


AD 3 Hippo 


17.7. 


AD 1 Occipital Ctx ]26.6 


AD 4 Hippo 


4.1 


AD 2 Occipital Ctx (Missing) jo.O 


AD 5 Hippo 


100.0 


AD 3 Occipital Ctx 127.0 


AD 6 Hippo 


45.4 


AD 4 Occipital Ctx |6.0 


Control 2 Hippo 


14.2 


AD 5 Occipital Ctx 


10.9 


Control 4 Hippo 


9.0 


AD 6 Occipital Ctx 


13.9 


Control (Path) 3 Hippo 


9.1 


Control 1 Occipital Ctx 


9.5 


AD 1 Temporal Ctx 


24.3 


Control 2 Occipital Ctx 


54.7 


AD 2 Temporal Ctx 


29.3 


Control 3 Occipital Ctx 


29.5 


AD 3 Temporal Ctx 


31.4 


Control 4 Occipital Ctx 


12.1 


AD 4 Temporal Ctx 


32.1 


Control (Path) 1 Occipital Ctx 


45.7 


AD 5 Inf Temporal Ctx 


81.2 


Control (Path) 2 Occipital Ctx 


23.5 


ADS Sup Temporal Ctx 


32.5 


Control (Path) 3 Occipital Ctx 


3.0 


AD 6 Inf Temporal Ctx 


17.9 


Control (Path) 4 Occipital Ctx 


22.5 


AD 6 Sup Temporal Ctx 


69.3 


Control 1 Parietal Ctx 


11.3 


Control 1 Temporal Ctx 9.6 


Control 2 Parietal Ctx 


41.5 


Control 2 Temporal Ctx 1 0.9 


Control 3 Parietal Ctx 


21.0 


Control 3 Temporal Ctx 1 7.3 


Control (Path) 1 Parietal Ctx 


43.2 
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Control 3 Temporal Ctx j23.0 


Control (Path) 2 Parietal Ctx 


20.2 i 


Control (Path) 1 Temporal Ctx {27.9 


Control (Path) 3 Parietal Ctx 


11.8 


Control (Path) 2 Temporal Ctx J32.8 


Control (Path) 4 Parietal Ctx 


24.1 



Table HC. General screening panel vl.5 



Tissue Name 


Rel. 

Exp.(%) 
Ag4907, 
Run 

L£.<SoLyj\}j 


Tissue Name 


[Rel. 

jExp.(%) 
;Ag4907, 
iRun 

\££oo£yj\)j 


/\UipObc 


c < 
j.j 


p PnQ i «~ Ti<r in 

rxcndl Ca. 1 rv- 1 U 


\l% 7 
\JO.Z 


IVlCIallUIIld. nbOOO^A J- 1 


1 fi 
I .O 


Diauucr 


16 7 

1 o. / 


Mplnnnma* l-k^ftftfR^ T 

iviciaiiurria nbooo^Dj. i 


4 1 




inn n 


IVlClalKJlIId 1V1IH 


z t .o 


Gastric ca. KATO III 


j43.5 


IVIclanOiTia LA^AIlVlvl 


17 A 


Colon ca. SW-948 


h 1,7 


ivieianoma orw-ivicLrj 


\LL,J 


Colon ca. SW480 


26.6 


oLjUalllUUb LCM LalClllUlIla •jv^v^-H 




Colon ca.* (SW480 met) SW620 


28.7 


1 CMIb rUUI 




Colon ca. HT29 


20.4 






Colon ca. HCT-116 


30.4 


n I Uj laic ruu I 




Colon ca. CaCo-2 


33.9 


PI JicAnta 




Colon cancer tissue 


15.8 


I ItPriK Pool 


7.5 


Colon ca.SW1116 


|4.7 


Ovarian ca OVCAR-1 

v cu lull vu. w v vni\ j 


■43.8 


Colon ca. Colo-205 


Si 13 ~ 


Ovarian ca SK-OV-3 

WQl lull w CX . <J l\ V J 


42.3 


Colon ca. SW-48 


■0.0 


Ovarian ra OVCAR-4 

V-/ vol lull v^u. \_/ v v^/v ix *r 


15.6 


Colon Pool 


22.5 


Ovarian ca OVCAR-5 


67.4 


Small Intestine Pool 


14.2 


Ovarian ca IGROV- 1 


13.0 


Stomach Pool 


13.6 


Ovarian ca OVCAR-8 


15.3 


Bone Marrow Pool |4.7 


Ovarv 


14.4 


Fetal Heart {6.0 


Breast ca. MCF-7 


39.0 


Heart Pool {2.7 


Breast ca. MDA-MB-231 


37.6 


Lymph Node Pool 


13.9 


Breast ca. BT 549 


32.8 


Fetal Skeletal Muscle 


12.7 


Breast ca. T47D 


23.0 


Skeletal Muscle Pool 




Breast ca. MDA-N 


20.9 


Spleen Pool 


T.6 


Breast Pool 


15.8 


Thymus Pool 


19.8 


Trachea 


4.9 jCNS cancer (glio/astro) U87-MG 


34.9 


Lung 


2.5 


CNS cancer (glio/astro) LM 1 8-MG 


73.7 


Fetal Lung 


35.4 


CNS cancer (neuro;met) SK-N-AS 


50.3 


Lungca.NCI-N417 


20^6 


CNS cancer (astro) SF-539 


11.9 


Lungca. LX-1 


41.8 


CNS cancer (astro) SNB-75 


51.4 


Lungca. NCI-HI 46 


26.2 


CNS cancer (glio) SNB- 19 


13.9 


Lungca. SHP-77 


63.7 


CNS cancer (glio) SF-295 


72.7 


Lung ca. A549 


17.1 1 


Brain (Amygdala) Pool 


5.5 
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6.3 jBrain (cerebellum) 


Al 1 


LAUIg Id. INV^ i - nz. J 


32.8 ;Brain (fetal) 


J 1 .U 


I nno ra UPI-Hd^O 

jUung ca. iNi^i-n^ou 


15.1 jBrain (Hippocampus) Pool 


1 O O 


• 1 uno pa HOP fS) 
L>UIIg Cd. nUr"DZ 


15.0 jCerebral Cortex Pool 


i n n 
1 u.u 


1 nno ra TsIPI-HS?? 


93.3 jBrain (Substantia nigra) Pool 


in 7 


Liver 


0.9 


Brain (Thalamus) Pool 


7 7 


Fetal Liver 


9.2 


Brain (whole) 


12.8 


Liver ca. HepG2 


19.5 


Spinal Cord Pool 


2.3 


Kidney Pool 


31.0 


Adrenal Gland 


12.7 


Fetal Kidney 


42.3 


Pituitary gland Pool 


13.7 


Renal ca. 786-0 


21.9 


Salivary Gland 


6.6 


Renal ca. A498 


11.9 


Thyroid (female) 


7.8 


Renal ca. ACHN 


30.8 


Pancreatic ca. CAPAN2 


40.9 


Renal ca. UO-3 1 


9.4 


Pancreas Pool 


34.6' 



Table HP. Panel 4.1D 



Tissue Name 


Rel. 

Exp.(%) 
Ag4907, 
Run 

223458613 


Tissue Name 


Rel. 

Exp.(%) 
Ag4907, 
Run 

223458613 


Secondary Thl act 


43.2 


HUVECIL-lbeta 


14.4 


Secondary Th2 act 


37.1 


HUVECI FN gamma 


36.3 


Secondary Trl act 


7.4 


HU VEC TNF alpha + IFN gamma 


0.0 


Secondary Thl rest 


6.6 


HUVEC TNF alpha + IL4 


0.0 


jSecondary Th2 rest 


8.4 


HUVECIL-11 


35.6 


Secondary Trl rest 


5.8 


Lung Microvascular EC none 


32.8 


Primary Thl act 


69.7 


Lung Microvascular EC TNFalpha + 1L- 
Ibeta 


37.4 


Primary Th2 act 


96.6 


Microvascular Dermal EC none 


9.4 


Primary Trl act 


63.7 


Microsvasular Dermal EC TNFalpha + 
1 L- 1 beta 


16.6 


Primary Thl rest 


0.0 


Bronchial epithelium TNFalpha + 
1 L 1 beta 


24.5 


jPrimary Th2 rest 


7.6 


Small airway epithelium none 


26.8 


jPrimary Trl rest 


14.6 


Small airway epithelium TNFalpha + IL- 
Ibeta 


19.9 


jCD45RA CD4 lymphocyte act 


46.3 


Coronery artery SMC rest 


0.0 


jcD45RO CD4 lymphocyte act 


58.2 


Coronery artery SMC TNFalpha + IL- 
Ibeta 


34.6 


CD8 lymphocyte act 


57.0 


Astrocytes rest 


48.3 


Secondary CD8 lymphocyte rest 


37.1 


Astrocytes TNFalpha + IL-lbeta 


14.0 


Secondary CD8 lymphocyte act 


18.2 


KU-812 (Basophil) rest 


37.4 


CD4 lymphocyte none 


16.6 


KU-8 12 (Basophil) PMA/ionomycin 


27.5 
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2ry Thl/Th2/Trl_anti-CD95 

pn 1 i 
L.H 1 1 


13.5 


CCD1 106 (Keratinocytes) none j23.7 


LAK cells rest 


7.4 


CCD1 106 (Keratinocytes) TNFalpha + * , 
IL-1 beta p 9 " 5 


LAK cells IL-2 


15 2 


Liver cirrhosis j6.4 


LAK cells IL-2+1L-I2 


9.2 


NCI-H292 none j6l.l 


LAK cells IL-2+IFN gamma 


17 T 


NC1-H292 IL-4 |48.6 


LAK cells IL-2+ IL-1 8 


14 4 


NCI-H292 IL-9 


100.0 


LAK rplls PM A/innnmvr i n 


12 4 


NCI-H292 IL-13 


66.0 _____ 


NK Cells IL-2 rest 


22.4 


NCI-H292 I FN gamma 


54.3 


Two Wav MLR 1 dav 


6 0 


HPAEC none 


46.7 


Two Wav MI R S Hav 

i wu way iviLiiv _j udjr 


16 1 

iO. 1 


HPAEC TNF alpha + IL-1 beta 


24.5 


Two Wav Mf R 7 rlav 


7 1 


Lung fibroblast none 


11.4 


PBMC rest 


25.7 


Lung fibroblast TNF alpha + IL-1 beta 


6.7 | 


PBMC PWM 


69.7 


Lung fibroblast IL-4 


20.4 


PBMC PHA-L 


20.7 


Lung fibroblast IL-9 


58.6 


Ramos (B cell) none 


34.6 


Lung fibroblast IL-1 3 ;23.3 


Ramos (B cell) lonomycin 


27.4 


Lung fibroblast IFN gamma 


12.0 


B lymphocytes PWM 


44.4 


Dermal fibroblast CCD1070 rest 


30.6 


B lymphocytes CD40L and IL-4 


27.5 


Dermal fibroblast CCD1070 TNF alpha 


29.3 


bOL-1 dbcAMP 


41.2 


Dermal fibroblast CCD 1070 IL-I beta 


35.4 


EOL-1 dbcAMP 

PM A/ionomvcin ' 

i i ▼ • *w i \jm ivi 1 1 y v f 1 1 


12.5 


Dermal fibroblast IFN gamma 


13.9 
49.3 


Dendritic cells none 


12.3 


Dermal fibroblast IL-4 


Dendritic cells LPS 


7.4 


Dermal Fibroblasts rest 


14.3 


Dendritic cells anti-CD40 


6.8 


Neutroph i Is TN Fa+LPS 


0.0 

o.o ~ 


Monocytes rest 


27.4 


Neutrophils rest 


Monocytes LPS 


7.9 


Colon [0.0 


Macrophages rest 


7.9 


Lung i 1 7.4 


Macrophages LPS 


0.0 


Thymus 7.5 


HUVEC none 


11.6 


Kidney |40.l 


HUVEC starved 


22.1 


I 



CNS_neurodcgeneration_vLO Summary: Ag4907 This panel confirms the 
expression of this gene at low levels in the brains of an independent group of individuals. 
However, no differential expression of this gene was detected between Alzheimer's diseased 
5 postmortem brains and those of non-demented controls in this experiment. Please see Panel 
1 .5 for a discussion of the potential use of this gene in treatment of central nervous system 
. disorders. 

General j5creening_panel_vl. 5 Summary: Ag4907 Highest expression of this gene 
is detected in gastric cancer NCI-N87 cell line (CT=31.3). Moderate levels of expression of 
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this gene is also seen in cluster of cancer cell lines derived from pancreatic, gastric, colon, 
lung, liver, renal, breast, ovarian, prostate, squamous cell carcinoma, melanoma and brain 
cancers. Thus, expression of this gene could be used as a marker to detect the presence of 
these cancers. Furthermore, therapeutic modulation of the expression or function of this gene 
5 may be effective in the treatment of pancreatic, gastric, colon, lung, liver, renal, breast, 
ovarian, prostate, squamous cell carcinoma, melanoma and brain cancers. 

Among tissues with metabolic or endocrine function, this gene is expressed at 
moderate to low levels in pancreas, adrenal gland, thyroid, pituitary gland, skeletal muscle, 
fetal liver and the gastrointestinal tract. Therefore, therapeutic modulation of the activity of 
10 this gene may prove useful in the treatment of endocrine/metabolically related diseases, such 
as obesity and diabetes. 

In addition, this gene is expressed at low levels in most regions of the central nervous 
system examined, including amygdala, hippocampus, substantia nigra, thalamus, cerebellum, 
and cerebral cortex. Therefore, therapeutic modulation of this gene product may be useful in 
1 5 the treatment of central nervous system disorders such as Alzheimer's disease, Parkinson's 
disease, epilepsy, multiple sclerosis, schizophrenia and depression. 

Interestingly, this gene is expressed at much higher levels in fetal (CTs=32.8-34.8) 
when compared to adult lung and liver (CTs=36-38). This observation suggests that 
expression of this gene can be used to distinguish fetal from adult lung and liver. In addition, 

20 the relative overexpression of this gene in fetal tissue suggests that the protein product may 
enhance growth or development of lung and liver in the fetus and thus may also act in a 
regenerative capacity in the adult. Therefore, therapeutic modulation of the protein encoded 
by this gene could be useful in treatment of lung and liver related diseases. 

Panel 4.1 D Summary: Ag4907 Low levels of expression of this gene are detected 

25 mainly in IL-9 treated mucoepidermoid cell line NCI-H292. The expression of this gene in 
this mucoepidermoid cell line that is often used as a model for airway epithelium (NCI-H292 
cells) suggests that this gene may be important in the proliferation or activation of airway 
epithelium. Therefore, therapeutics designed with the protein encoded by this gene may 
reduce or eliminate symptoms caused by inflammation in lung epithelia in chronic 

30 obstructive pulmonary disease, asthma, allergy, and emphysema. 

L NOV58a: Novel 2410017P07RIK Protein - Like Gene 
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Expression of gene NOV58a was assessed using the primer-probe set Ag4913, 
described in Table I A. Results of the RTQ-PCR runs are shown in Tables IB and IC. 
Table IA. Probe Name Ag4913 



Primers 


Sequences 


Length 


Start Position 


SEQ ID 

No 


Forward 


5'-attcacccagaatgaaccatct-3' 


22 


75 


369 


Probe 


TET-5'-cagaattgccatcctgcaaacttaga-3 '-T A M R A 


26 j 


97 


370 


Reverse 


5'-tggctatttgggctatgaagta-3' 


22 


140 


371 



5 

Table IB. General screening panel vl.5 



Tissue Name 


Rel. 

T"l_. _ f ft f \ 

Exp.(%) 

Aff4913 

Run 

228829778 


Tissue Nnmp 


Rel. 

Exp.(%) 
Run 

228829778 


Adipose 


7.9 


Renal ca. TK-10 


13.0m 


Melanoma* Hs688(A).T 


15.1 


Bladder 


13.8 


Melanoma* Hs688(B).T 


14.6 


Gastric ca. (liver met.) NCI-N87 


11.0 


Melanoma* M14 


8.0 


Gastric ca. KATO III 


27.2 


Melanoma* LOXIMVI 


19.5 


Colon ca. SW-948 


7.8 


Melanoma* SK-MEL-5 


13.4 


Colon ca. SW480 


20.4 


Squamous cell carcinoma SCC-4 


12.5 


Colon ca.* (SW480 met) SW620 


8.4 


Testis Pool 


21.0 


Colon ca. HT29 


10.7 


Prostate ca.* (bone met) PC-3 


11.3 


Colon ca. HCT-116 


55.5 


Prostate Pool 


7.6 


Colon ca. CaCo-2 


13.2 


Placenta 


2.0 


Colon cancer tissue 


13.8 


Uterus Pool 


8.4 


Colon ca. SW1 116 


3.7 


Ovarian ca. OVCAR-3 


7.7 


Colon ca. Colo-205 


3.3 


Ovarian ca. SK-OV-3 


27.4 


Colon ca. SW-48 


5.2 


Ovarian ca. OVCAR-4 


6.6 


Colon Pool 


16.3 


jOvarian ca. OVCAR-5 


17.3 


Small Intestine Pool 


15.7 


[Ovarian ca. IGROV-1 


£T 


Stomach Pool 


10.4 


Ovarian ca. OVCAR-8 


6.7 


Bone Marrow Pool 


6.8 


.Ovary 


7.1 


Fetal Heart 


9.3 


Breast ca. MCF-7 


20.3 


Heart Pool 


7.8 


Breast ca. MDA-MB-23 1 


39.8 


Lymph Node Pool 


21.8 | 


Breast ca. BT 549 


100.0 


Fetal Skeletal Muscle 


10.9 


Breast ca. T47D 


10.0 


Skeletal Muscle Pool j 


24.5 


Breast ca. MDA-N 


9.7 


Spleen Pool jl0.5 


Breast Pool 


18.4 


Thymus Pool j 1 5.4 


Trachea 


8.9 


CNS cancer (glio/astro) U87-MG j 


26.6 


Lung 


4.0 


CNS cancer (glio/astro) U- 1 1 8-MG j 


65.1 j 
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Fetal Lung 


41.2 


CNS cancer (neuro;met) SK-N-AS j 1 4.2 


Lungca. NCI-N4I7 


5.0 


CNS cancer (astro) SF-539 j 1 9.3 


Lung ca. LX-1 


23.8 


CNS cancer (astro) SNB-75 |58.2 


Lung ca. NCI-HI 46 


11.9 


CNS cancer (glio) SNB-19 |8.7 


Lung ca. SHP-77 


32.1 


CNS cancer (glio)SF-295 


30.1 


Lung ca. A549 


16.3 


Brain (Amygdala) Pool ]7.2 


Lung ca. NCI-H526 


11.4 


Brain (cerebellum) 


39.2 


Lung ca. NCI-H23 


19.5 


Brain (fetal) |l8.7 


Lungca. NCI-H460 


7.4 


Brain (Hippocampus) Pool 7.3 


Lung ca. HOP-62 


5.6 


Cerebral Cortex Pool 


14.0 


Lung ca. NCI-H522 


36.6 


Brain (Substantia nigra) Pool |6.4 


Liver 


0.8 


Brain (Thalamus) Pool 


15.8 


Fetal Liver 


31.6 


Brain (whole) 


6.5 


Liver ca. HcpG2 


5.8 


Spinal Cord Pool 


7.8 


Kidney Pool 


25.3 


Adrenal Gland 


3.6 
4.9 


Fetal Kidney 


30.8 


Pituitary gland Pool 


Renal ca. 786-0 


26.8 


Salivary Gland 


1.1 


Renal ca. A498 


4.0 


Thyroid (female) 


4.5 


Renal ca. ACHN 


5.4 


Pancreatic ca. CAPAN2 


9.9 


Renal ca. UO-3 1 


12.9 


Pancreas Pool 


13.7 



Table 1C. Panel 4.1D 



Tissue Name 


Rel. 

Exp.(%) 
Ag4913, 
Run 

223458616 


Tissue Name 


Rel. 

Exp.(%) 
Ag4913, 
Run 

1223458616 


Secondary Th 1 act 


38.2 


HUVEC IL-lbeta 


|23.3 


Secondary Th2 act 


42.0 


HUVEC I FN gamma 


50.0 


Secondary Trl act 


38.2 


HUVEC TNF alpha + I FN gamma 


33.4 


Secondary Thl rest 


10.0 


HUVEC TNF alpha + IL4 


20.9 


Secondary Th2 rest 


16.6 


HUVEC IL-1 1 


21.3 


Secondary Trl rest 


12.5 


Lung Microvascular EC none 


49.7 


Primary Th 1 act 


12.3 


Lung Microvascular EC TNFalpha + IL- 
lbeta 


35.8 


Primary Th2 act 


24.1 


Microvascular Dermal EC none 


32.3 


Primary Trl act 


23.3 


Microsvasular Dermal EC TNFalpha + 
IL-lbeta 


20.7 


Primary Th 1 rest 


12.3 


Bronchial epithelium TNFalpha + 
ILIbeta 


12.6 


Primary Th2 rest 


9.3 


Small airway epithelium none 


r 4.5 


Primary Trl rest 


19.6 


Small airway epithelium TNFalpha + IL- 
lbeta 


8.5 


CD45RA CD4 lymphocyte act 


37.4 


Coronery artery SMC rest 


13.2 
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CD45RO CD4 lymphocyte act 


42.3 


Coronery artery SMC TNFalpha + IL- 
Ibeta 


16.7 


CD8 lymphocyte act 


32.5 


Astrocytes rest 


7.1 


Secondary CD8 lymphocyte rest 


19.3 


Astrocytes TNFalpha + IL-lbeta 


9.2 


Secondary CD8 lymphocyte act 


14.0 


KU-8 12 (Basophil) rest 


92.0 


CD4 lymphocyte none 


4~5 """" 


KU-812 (Basophil) PMA/ionomycin 


100.0 


2ry Th1/Th2/Trl_anti-CD95 


24.7 


CCD1 106 (Keratinocytes) none 


22.8 


LA K cells rest 


11.3 


CCD 1 106 (Keratinocytes) INFalpha + 
I L- 1 beta 


28.3 


LAK cells IL-2 


24.0 


I iver cirrho^i^ 


6 2 


LAK cells IL-2+IL-12 


5.6 


NCI-H292 none 


18 6 


LAK cells 1L-2+IFN eamma 


15.1 


NCI-H292 IL-4 


25 2 


LAK cells IL-2+ IL-18 


14.9 


NCI-H292 IL-9 


38 4 

JO.t 


LAK cells PMA/ionomycin 


6.1 


NCI-H292 IL-13 


39.2 


NK Cells IL-2 rest 


0.0 


NCI-H292 IFN eamma 


46 3 


Two Wav MLR 3 dav 


12.0 


HPAFP none 




Two Wav MLR 5 dav 


17.4 


HPAECTNFalnha + II -1 beta 


24 7 


Two Way MLR 7 day 


13.9 


I Mnp fibroblast nonp 


29 9 


PBMC rest 


4.4 


Lung fibroblast TNF alpha + IL-1 beta 


16.3 


nn \ Af* d\\/\a 
rbMt rWM 


12.7 


Lung fibroblast IL-4 


1 1.6 


rOMC IHA-L 


20.7 


Lung Fibroblast IL-9 


12.9 


Ramos (B cell) none 


22.7 


Lung fibroblast IL-13 


14.2 


Ramos (B cell) ionomycin 


28.1 


Lung fibroblast IFN gamma 


37.1 


B lymphocytes PWM 


23. / 


Dermal tibroblast CCD1 070 rest 


66.9 


d lympnocytes CD4UL and IL-4 


21 .8 


Dermal fibroblast CCD1070 TNF alpha ; 


76.3 


CfM 1 AUr*A\AO 

cVJL-l aDCAIVIr 


jU.J 


Dermal tibroblast CCD 1070 IL-1 beta 


37.4 | 


CL/L- 1 U DC A 1V1 r 

PMA/ionomycin 


23.7 


Dermal fibroblast IFN gamma 


2.0 


Dendritic cells none 


14.7 


Dermal fibroblast IL-4 


40.9 


Dendritic cells LPS 


9.9 


Dermal Fibroblasts rest 


24.1 


Dendritic cells anti-CD40 


10.9 


Neutrophils TNFa+LPS 


1.3 


Monocytes rest 


8.4 


Neutrophils rest 


6.4 


Monocytes LPS 


9.3 


Colon 


4.6 


Macrophages rest 


15.7 


Lung 


9.4 


Macrophages LPS 


5.8 


Thymus 


34.6 


jHUVECnone_ 


22.7 


Kidney 


19.3 


HUVEC starved 


26.1 







General_screening_panel_vl.5 Summary: Ag4913 Highest expression of this gene 
is detected in breast cancer BT 549 cell line (CT=26.4). Moderate to high levels of expression 
of this gene is also seen in cluster of cancer cell lines derived from pancreatic, gastric, colon, 
5 lung, liver, renal, breast, ovarian, prostate, squamous cell carcinoma, melanoma and brain 
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cancers. Thus, expression of this gene could be used as a marker to detect the presence of 
these cancers. Furthermore, therapeutic modulation of the expression or function of this gene 
may be effective in the treatment of pancreatic, gastric, colon, lung, liver, renal, breast, 
ovarian, prostate, squamous cell carcinoma, melanoma and brain cancers. 

5 Among tissues with metabolic or endocrine function, this gene is expressed at 

moderate levels in pancreas, adipose, adrenal gland, thyroid, pituitary gland, skeletal muscle, 
heart, liver and the gastrointestinal tract. Therefore, therapeutic modulation of the activity of 
this gene may prove useful in the treatment of endocrine/metabolically related diseases, such 
as obesity and diabetes. 

10 Interestingly, this gene is expressed at much higher levels in fetal (CTs=27.7-28) 

when compared to adult lung and liver (CTs=31-33). This observation suggests that 
expression of this gene can be used to distinguish fetal from adult lung and liver. In addition, 
the relative overexpression of this gene in fetal tissue suggests that the protein product may 
enhance growth or development of liver and lung in the fetus and thus may also act in a 

15 regenerative capacity in the adult. Therefore, therapeutic modulation of the protein encoded 
by this gene could be useful in treatment of lung and liver related diseases. 

In addition, this gene is expressed at moderate levels in all regions of the central 
nervous system examined, including amygdala, hippocampus, substantia nigra, thalamus, 
cerebellum, cerebral cortex, and spinal cord. Therefore, therapeutic modulation of this gene 
20 product may be useful in the treatment of central nervous system disorders such as 

Alzheimer's disease, Parkinson's disease, epilepsy, multiple sclerosis, schizophrenia and 
depression. 

Panel 4.1D Summary: Ag4913 Highest expression of this gene is detected in 
basophils (Cts=29). This gene is expressed at high to moderate levels in a wide range of cell 

25 types of significance in the immune response in health and disease. These cells include 

members of the T-cell, B-cell, endothelial cell, macrophage/monocyte, and peripheral blood 
mononuclear cell family, as well as epithelial and fibroblast cell types from lung and skin, 
and normal tissues represented by colon, lung, thymus and kidney. This ubiquitous pattern of 
expression suggests that this gene product may be involved in homeostatic processes for 

30 these and other cell types and tissues. This pattern is in agreement with the expression profile 
in General__screening_panel_vl.5 and also suggests a role for the gene product in cell 
survival and proliferation. Therefore, modulation of the gene product with a functional 
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therapeutic may lead to the alteration of functions associated with these cell types and lead to 
improvement of the symptoms of patients suffering from autoimmune and inflammatory 
diseases such as asthma, allergies, inflammatory bowel disease, lupus erythematosus, 
psoriasis, rheumatoid arthritis, and osteoarthritis. 

5 J. NOV59a: Novel FLJ20565 Like Gene 

Expression of gene NOV59a was assessed using the primer-probe set Ag4914, 

described in Table JA. Results of the RTQ-PCR runs are shown in Table JB. 
Table JA. Probe Name Ag4914 



i 

jPrimers 


jsequences 


Length 


Start Position 


SEQ ID 
No 


i'Forwarcl 


5'-ggggaagaaaagaaacaaagag-3 1 


22 


646 


372 


jProbe 


TET-5'-ccccaacacagcctaaggccaag-3'-TAMRA 


23 


696 


373 


j Reverse 


5'-tctlaggctttccctctttagg-3' 


22 


722 


374 



Table JB. General screening panel vl.5 



Tissue Name 


Rel. 

Exp.(%) 
Ag4914, 
Run 

228839040 


Tissue Name 


Rel. 

Exp.(%) 
Ag4914, 
Run 

228839040 


Adipose 


3.6 


Renal ca. TK-10 


0.0 


Melanoma* Hs688(A).T 


0.0 


Bladder 


jo.o 


jMelanoma* Hs688(B).T 


0.0 


Gastric ca. (liver met.) NCI-N87 


IT4.5 


{Melanoma* M14 




0.0 


iGastric ca. KATO III 

— ■ ...... . ... . ... 


[o.o 


jMelanoma* LOXIMVI 




0.0 




Colon ca. SW-948 


0.0 


Melanoma* SK-MEL-5 


'I 


0.0 


i 


Colon ca. SW480 


0.0 


Squamous cell carcinoma SCC-4 


j 


0.0 


jColon ca.* (SW480 met) SW620 


0.0 


Testis Pool 


]20.0 


I 


Colon ca. HT29 


0.0 


jProstate ca.* (bone met) PC-3 




0.0 


I 


Colon ca. HCT-116 


0.0 


JProstate Pool 


I 


0.0 


(Colon ca. CaCo-2 ]0.0 


jPlacenta 


jq.o 


! 


Colon cancer tissue jO.O 


jUterus Pool 


]0.0 


"I 


Colon ca. SW1 116 ]7.1 


jOvarian ca. OVCAR-3 


jo.o 


jColon ca. Colo-205 


0.0 


Ovarian ca. SK-OV-3 


jo.o 


|Colon ca. SW-48 


0.0 


Ovarian ca. OVCAR-4 


jo.o 


jColon Pool 


60.7 


Ovarian ca. OVCAR-5 


!o.o 


|Small Intestine Pool 


0.0 


Ovarian ca. IGROV-I 




3.0 


jStomach Pool 


22.7 


Ovarian ca. OVCAR-8 


jo.o 


|Bone Marrow Pool 


0.0 


Ovary 


|o.o 


|Fetal Heart 


0.0 


Breast ca. MCF-7 


|o.o 


jHeart Pool 


0.0 
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Breast ca. MDA-MB-231 


10 0 

, , : „ ■ 


Lymph Node Pool 


|o.o 




Breast ca BT 549 


0 0 


Fetal Skeletal Muscle 


jO.O 




Breast ca T47D 


!o o 

| v/.v/ 


Skeletal Muscle Pool jo.O 


Breast ca. MDA-N 


jo.o 


Spleen Pool jo.O 


Breast Pool 


35 1 


Thymus Pool 


|0.0 




Trachea 


0 0 

i 


CNS cancer (glio/astro) U87-MG 






I .lint? 


l?4 1 


CNS cancer (glio/astro) U-1 18-MG 


~ 174/7 




Fetal Lune 


jo.o 


CNS cancer (neuro;met) SK-N-AS 


jO.O 




Luneca NCI-N4I7 


lo 0 


CNS cancer (astro) SF-539 


jO.O 




Lime ca LX-1 


loo 


CNS cancer (astro) SNB-75 


114.4 




Luneca NCI-HI46 

I— i 11 1 1 CL V/ Ua 1 > 1 1 1 J TV/ 


0 0 


CNS cancer (glio) SNB- 1 9 10.0 


Lung ca SHP-77 


JfiQ 7 


CNS cancer (glio) SF-295 


|0.0 ~ 




I Jinp ra A 54Q 


lo 0 

i\J.\J 


Brain (Amygdala) Pool 


|o.o 




Luneca NCI-H526 


in o 


Brain (cerebellum) jO.O 


Luntrca NCI-H21 


lo 0 


Brain (fetal) ;0.0 


Luneca NCI-H460 


lo 0 


Brain (Hippocampus) Pool 


jo.o 




Luneca HOP-62 


10 0 

jV/.V/ 


Cerebral Cortex Pool jO.O 


Luneca NCI-H522 


10 0 


Brain (Substantia nigra) Pool 


jo.o 




Liver 


jo.o 


Brain (Thalamus) Pool 


|0.0 




Fetal Liver 


jo.o 


Brain (whole) 


!0.0 




Liver ca. HepG2 


Jo.o 


Spinal Cord Pool jO.O 


Kidney Pool 


[18.8 


Adrenal Gland 


jo.o 




Fetal Kidney 


jo.o 


Pituitary gland Pool 


jo.o 




Renal ca. 786-0 


jo.o 


Salivary Gland 


io.o 




Renal ca. A498 


Jo.o 


Thyroid (female) 


]0.0 




Renal ca. ACHN 


jo.o 


Pancreatic ca. CAPAN2 


jioo.o 




Renal ca. UO-31 


jo.o 


Pancreas Pool 


jo.o 





General_screening_panel_vl.5 Summary: Ag4914 Low levels of expression of 
this gene are restricted to pancreatic cancer cell line (CT=34.5). Therefore, expression of this 
gene may be used to distinguish this sample from other samples in this panel and also as 
5 diagnostic marker for detection of pancreatic cancer. Furthermore, therapeutic modulation of 
this gene may be useful in the treatment of this cancer. 

K. NOV60a: CGI-27 Protein Like Gene 

Expression of gene NOV60a was assessed using the primer-probe set Ag4915, 
described in Table KA. Results of the RTQ-PCR runs are shown in Tables KB and KC. 
10 Table KA. Probe Name Ag491S 



Primers 


Sequences 


Length 


Start Position 


SEQ ID 
No 


Forward 


5 -agecac tctgggaac tggta-3 * 


20 


31 


375 
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Probe 


TET-S'-ctcaggaccacagctgaatgcacago-TAMRA 25 


57 


376 


Reverse 


5'-tacttgtgaaagccaaccttct-3' J22 ]84 


377 



Table KB. General screening panel vl.5 



Tissue Name 


Rel. 

Exp.(%) 
Ag4915, 
Kun 

228839041 


i 

Tissue Name 


Rel. 

Exp.(%) 
Ag4915, 
Run 

228839041 


AdiDose 


0.0 


Renal ca. TK-10 


30.1 


Melanoma* Hs688(A).T 


3.0 


Bladder 


11.8 


Melanoma* Hs688fB) T 


4.4 


Gastric ca diver met 1 NCI-N87 

V»U*JH IW WU* lilt Vh- I IIIVli 1 * ~ 1 I ▼ V-/ 1 


45.4 


Melanoma* M14 


15.7 


Gastric ca. KATO III 


9.3 


Melanoma* LOX1MVI 


26.6 


Colon ca. SW-948 


0.0 


Melanoma* SK-MEL-5 


10.2 


Colon ca. SW480 


7.5 


Sauamous cell carcinoma SCC-4 

UUUUIl IvUs/ WWII VUI VlilvlllU fcJX^Xw- ~ 


3.3 


Colon ca * CSW480 meO SW620 


4.3 


Testis Pool 


100.0 


Colon ca HT29 


5 8 


Prostate ca.* (bone met) PC-3 


6.0 


Colon ca. HCT-1 16 


86.5 


Prostate Pool 


0.0 


Colon ca. CaCo-2 


24.7 


Placenta 


0.0 


Colon cancer tissue 


9.2 


Uterus Pool 


0.0 


Colon ca. SW1 1 16 


20.7 


Ovarian ca. OVCAR-3 


33.7 


Colon ca. Colo-205 


11.1 


Ovarian ca. SK-OV-3 


36.9 


Colon ca. SW-48 


OlO 


Ovarian ca. OVCAR-4 


3.3 


Colon Pool 


0.0 


Ovarian ca. OVCAR-5 


71.2 

; 


Small Intestine Pool 


8.4 


Ovarian ca. IGROV-1 


88.9 


Stomach Pool 


17.0 ~~ 


Ovarian ca. OVCAR-8 


3.6 


Bone Marrow Pool 


15.8 


Ovary 


2.6 


Fetal Heart 


0.0 


Breast ca. MCF-7 


0.0 


Heart Pool 


0.0 


Breast ca. MDA-MB-231 


4.2 


Lymph Node Pool 


5.0 


Breast ca. BT 549 


59.9 


Fetal Skeletal Muscle 


0.0 


Breast ca. T47D 


0.0 


Skeletal Muscle Pool 


61.1 


Breast ca. MDA-N 


0.0 


Spleen Pool 


0.0 


Breast Pool 


5.3 


Thymus Pool 


8.7 


Trachea 


0.0 


CNS cancer (glio/astro) U87-MG 


31.4 


Lung 


0.0 


CNS cancer (glio/astro) IM 1 8-MG 


16.2 


Fetal Lung 


10.7 


CNS cancer (neuro;met) SK-N-AS 


7.6 


Lung ca. NCI-N417 


2.7 


CNS cancer (astro) SF-539 


3.7 


Lungca. LX-I ] 


31.4 


CNS cancer (astro) SNB-75 


40.1 


Lung ca. NCI-H 146 j 


92.0 


CNS cancer (glio) SNB- 1 9 


25.2 


Lungca. SHP-77 


3.0 


CNS cancer (glio) SF-295 


18.6 


Lungca, A549 ] 


8.0 


Brain (Amygdala) Pool 


7.9 


Lungca.NCI-H526 j 


0.0 


Brain (cerebellum) 


18.7 
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;Lung ca.NCI-H23 


17.9 


Brain (fetal) 


42.9 


Xung ca. NCI-H460 


10.4 


Brain (Hippocampus) Pool 


7.2 


;Lung ca. HOP-62 


12.4 


Cerebral Cortex Pool 


57.4 


iLung ca. NCI-H522 


8.8 


Brain (Substantia nigra) Pool 


24.8 


t — - - — "■■ — — 

jLiver 


0.0 


Brain (Thalamus) Pool 


49.7 


iFetal Liver 


0.0 


Brain (whole) 


48.6 


Liver ca. HepG2 


6.3 


Spinal Cord Pool 


18.6 


Kidney Pool 


7.3 


Adrenal Gland 1 


7.2 


jFetal Kidney 


22.4 


Pituitary gland Pool 


0.0 


Renal ca. 786-0 


20.9 


Salivary Gland 


14.1 


Renal ca. A498 


48.0 


Thyroid (female) 


0.0 


Renal ca. ACHN 


8.7 


Pancreatic ca. CAPAN2 


29.7 


Renal ca. UO-31 


22.5 


Pancreas Pool 


8.4 



Tabic KC. Panel 4.1D 



j ■ 

Tissue Name 


Rel. 

Exp.(%) 
Ag4915, 
Run 

223458640 


Tissue Name 


{Rel. 
|Exp.(%) 
jA g 4915, 
Run 

(223458640 


Secondary Th 1 act 


0.9 


HUVEC IL-lbeta 


jo.i 


Secondary Th2 act 


1.1 


HUVEC I FN gamma 


jo.q 


Secondary Trl act 


0.4 


HUVEC TNF alpha + 1FN gamma 


0.0 


Secondary Th 1 rest 


0.0 


HUVEC TNF alpha + IL4 


0.0 


Secondary Th2 rest 


0.0 


HUVEC IL- 11 


0.0 


Secondary Trl rest 


100.0 


Lung Microvascular EC none 


[0.0 


Primary Thl act 


0.0 


Lung Microvascular EC TNFalpha + IL- 
Ibeta 


1 

0.0 


Primary Th2 act 


0.2 


Microvascular Dermal EC none 


0.0 


jPrimary Trl act 


0.1 


Microsvasular Dermal EC TNFalpha + 
IL-lbeta 


0.0 


r- 1 ' 1 

[Primary Thl rest 

u — — - , 


0.0 


Bronchial epithelium TNFalpha + 
ILlbeta 


0.2 


•Primary Th2 rest 

#■ ' > ■■■ 


0.0 


Small airway epithelium none 


0.5 


jPrimary Trl rest 

i 


0.3 


Small airway epithelium TNFalpha + IL- 
lbeta 


0.1 


;CD45RA CD4 lymphocyte act 


0.3 


Coronery artery SMC rest 


0.0 


CD45RO CD4 lymphocyte act 


0.6 


Coronery artery SMC TNFalpha + IL- 
Ibeta 


0.4 


CD8 lymphocyte act 


0.3 


Astrocytes rest 


0.0 


Secondary CD8 lymphocyte rest 


0.0 


Astrocytes TNFalpha + 1 L- 1 beta 


0.0 


Secondary CD8 lymphocyte act 


0.0 


KU-812 (Basophil) rest 


0.4 


CD4 lymphocyte none 


0.7 


KU-812 (Basophil) PMA/ionomycin 


0.4 



497 



WO 03/023002 



IPCT/US02/28539 



2ry Thl/Th2/Trl_anti-CD95 
(CHI 1 


0.2 

1 


CCD1 1 06 (Keratinocytes) none 


0.0 


j 

jLAK cells rest 

i .. . 


i 

0.2 

1 


CCD1 106 (Keratinocytes) TNFalpha + 

II -Ihpta 

1 L. 1 UClu 


0.2 


J 


I| rf»llc II -1 

iLrtlX 1<CI13 1 L/ Z. 

3 _ . , 




ui vti v^llillvJolo 


0.2 


I A \C ppIIc II -9+II -1 ? 
Lni\ Ltllb lL"i ' 1L*1^ 


!o 6 

IV/. u 




0.0 


T AkT rpllc II -9+1 FN oammn 


lo 0 


NCI-H7Q? 11 -4 


0.1 


IL<-Z~ 11^-10 


lo 0 


NCI-H9Q? II -0 


0.0 


LAK cells PMA/ionomycin 


|0,3 


Nr , l-H?09 II 1 ^ 


0.3 


NK Cells IL-2 rest 


(0.3 


NPI-H9Q9 IFKI mm n 


0.0 


Two Way MLR 3 day 


|o.i 


up A Pp nnnP 
nrnCv IIUIIC 


0.1 


Two Way MLR 5 day 


|o.o 


HPAPP TMF alnha 4- II -1 hpfa 
nrnCL 1 INr dlUIld » IL^-J DC Id 


0.3 


Two Way MLR 7 day 


|0.5 


Luutj liDiuuidoi nunc 


0.0 


PBMC rest 


jo.o 


Lung fibroblast TNF alpha + IL-1 beta 


0.3 


PBMC PWM 


"M , 


Lung fibroblast IL-4 


0.0 


PBMC PHA-L 


j0.2 


Lung fibroblast IL-9 


0.5 


Ramos (B cell) none jo. 1 


Lung fibroblast IL-1 3 


0.0 


Ramos (B cell) ionomycin 


|0.1 


Lung fibroblast IFN gamma 


0.2 


B lymphocytes PWM 


|0.5 


Dermal fibroblast CCD1070 rest 


0.1 


B lymphocytes CD40L and IL-4 jo.2 


Dermal fibroblast CCD1070 TNF alpha 


o.o 1 


EOL-1 dbcAMP 


|o.o 


Dermal fibroblast CCD1070 IL-I beta 


0.0 


EOL-1 dbcAMP 
PMA/ionomycin 


0.0 


Dermal fibroblast IFN gamma 


0.0 


Dendritic cells none 




Dprmal flhrohla^t II -4 


0.0 


Dendritic cells LPS 


jo.o 


Dermal Fibroblasts rest 


0.2 




Dendritic cells anti-CD40 


|0.3 


Neutrophils TNFa+LPS 


0.0 




Monocytes rest 


jo.o 


Neutrophils rest 


0.0 




Monocytes LPS 


j0.2 


Colon 


0.0 


Macrophages rest 


jo.o 


Lung 


0.0 


Macrophages LPS 


jo.o 


Thymus 


0.4 


HUVEC none 


jo.o 


Kidney 


0.9 


HUVEC starved 


(o.6 







General_scrcening_panel_vl.5 Summary: Ag4915 Highest expression of this gene 
is detected in testis (CT=34.3). Thus, expression of this gene could be used to differentiate 
between this sample and other samples on this panel and as a marker of testicular tissue. 
5 Therapeutic modulation of the expression or function of this gene may be useful in the 
treatment of male infertility and hypogonadism. 

In addition, low levels of expression of this gene is also seen in number of cancer cell 
lines derived from ovarian, lung and colon. Therefore, therapeutic modulation of this gene 
may be useful in the treatment of ovarian, lung and colon cancer. Furthermore, expression of 
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this gene may be used as diagnostic marker for the detection of colon, lung and ovarian 
cancers. 

Panel 4. ID Summary: Ag4915 Moderate levels of expression of this gene is 
restricted to secondary Trl cells (CT=28.8). Thus, expression of this gene may be used to 
5 distinguish Trl cell from other samples used in this panel.Furhtermore, expression of this 
gene in resting Trl cells suggest a role for this gene in T lymphocyte activation. Therefore, 
therapeutic modulation of this gene or its protein product may be useful in the treatment of T 
cell-mediated autoimmune and inflammatory diseases. 

L. NOV64a: Ankyrin-repeat containing protein 

10 Expression of gene NOV64a was assessed using the primer-probe set Ag4950, 

described in Table LA. Results of the RTQ-PCR runs are shown in Tables LB and LC. 
Table LA. Probe Name Ag4950 



Primers 


Sequences 


Length 


Start Position 


SEQ ID 
No 


Forward 


5'-cttgagtgttgctgctaagca-3' 


21 


900 


378 


Probe 


TET-5'-tccccgagaaagtgtcagagccttta-3'-TAMRA 


26 |926 \379 


Reverse 


5'-tccttttccatgggaaggt-3' 


19 ]957 380 



15 Table LB. General screening panel vl.5 



Tissue Name 


Rel. 

Exp.(%) 
Ag4950, 
[Run 

1228850857 


Tissue Name 


:Rel. 

|Exp.(%) 
!Ag4950, 
Run 

228850857 


Adipose 


4.1 


Renal ca. TK-10 


21.9 


Melanoma* Hs688(A).T 


3.6 


Bladder 


19.3 


Melanoma* Hs688(B).T 


4.0 


Gastric ca. (liver met.) NCI-N87 


50.7 


Melanoma* MI4 


0.0 


Gastric ca. KATO III 


5 F.I " 


Melanoma* LOXIMV! 


0.0 


Colon ca. SW-948 


7.6 


Melanoma* SK-MEL-5 


15.4 


Colon ca. SW480 


14.9 


Squamous cell carcinoma SCC-4 


0.7 


Colon ca.* (SW480 met) SW620 


20.7 


Testis Pool 


100.0 


Colon ca. HT29 


19.1 


Prostate ca.* (bone met) PC-3 


4.8 


Colon ca. HCT-116 


3.9 


Prostate Pool 


2.9 


Colon ca. CaCo-2 


30.1 


Placenta 


8.8 


Colon cancer tissue \2 1 .0 


Uterus Pool 


2.1 


Colon ca. SW1I16 


2.8 


Ovarian ca. OVCAR-3 


2.9 


Colon ca. Colo-205 


4.4 


Ovarian ca. SK-OV-3 


0.7 


Colon ca. SW-48 


6 .T " 


Ovarian ca. OVCAR-4 


0.6 


Colon Pool 


0.0 
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Ovarian ca. OVCAR-5 


46.7 


oiiidii iiiiCMIMC ruui 


jO.J 


Ovarian ca. IGROV-1 


0.0 


OlUMiclLII rUUI 




Ovarian ca. OVCAR-8 




j4.5 


punc iviairow ruui 


j.Z 


Ovary 




jl.5 


jl Cldl ncdi I 


1 1 


Breast ca. MCF-7 




jl4.6 


jncdj i ruui 


1 0 
1 .1 


Breast ca. MDA-MB-23 1 


{22.2 


il vmr^h P/\r\l 

|L/yinpn iNoue rooi 




Breast ca. BT 549 




0.5 


rciai oivcieidi iviubcie 


a n 


breast ca. 1 4 /u 


I 1 .4 


^lff»lf>tQl N^iicoIp Prtr\ 1 


1 St 1 
\ O.J 


Breast ca. ivlUA-N 


9.3 


opiccn rOOl 


1U. / 


Breast Pool 


1 .3 


i iiyinui rooi 




Trachea 


9.0 


Lancer ^guo/asiroj Uo/-ivivj 


3. j 


Lung 


0.0 


^iNo cancer ^giio/asiroj u-i lo-ivnj 


A 1 ^ 


retal Lung 


12.9 


uiNo cancer ^neuro,merj oix-in-Ao 


n i 


Lung ca. NLI-N417 


0.0 


v^jNo cancer {asiroj orojy 


o.u 


Lung ca. LX-1 


7.3 


\^iNo cancer ^asiroy o1nd-/j 


j ID. 1 


Lungca. NCI-HI 4o 


0.0 


v^ino cancer ^gnoj oiNts-iy 




r — „ r. n CUD *7"7 

Lung ca. brir- / / 


0.2 


cancer (gnoj or-zvo 


1*2 O 


Lung ca. A549 


20.4 


Drain lAmygGaiaj rOOl 


^ Q 


Lungca. NCI-H526 


0.1 


Drain {cereoeuumj 


U. 1 


Lungca. NCI-H23 


5.5 


orain ^rexai j 


z.U 


Lung ca. NCI-H460 


10.2 


Brain (Hippocampus) Pool 


o i 
L. I 


Lungca. HOP-62 


3.6 


v^ereorai t^onex kooi 


inn 


Lungca. NCI-H522 


69.3 


orain ^oUDstanua nigra; rooi 


o.u 


Liver 


0.0 


orain ^ i naiamusj rooi 


O. / 


Fetal Liver 


0.0 


Brain (whole) 


0.0 


Liver ca. HepG2 


0.0 


Spinal Cord Pool 


4.9 


Kidney Pool 


3.4 


Adrenal Gland 


1.6 


Fetal Kidney 


4.4 


Pituitary gland Pool 


1.8 


Renal ca. 786-0 


10.4 


Salivary Gland 


1.6 


Renal ca. A498 


4.2 


Thyroid (female) 


1.8 


IRenal ca. ACHN 


6.1 


Pancreatic ca. CAPAN2 


8.9 


I'Renal ca. UO-31 


10.6 


Pancreas Pool 


4.9 


Table LC. Panel 4.1 D 


Tissue Name 


Rel. 

Exp.(%) 
Ag4950, T 
Run 

223626816 


issue Name 

.. . . . . .i 


Rel. 

Exp.(%) 
Ag4950, 
Run 

223626816 


Secondary Thl act 


59.5 H 


UVECIL-lbeta 


1.0 


Secondary Th2 act 


79.6 H 


UVEC I FN gamma 


0.0 


Secondary Trl act 


100.0 H 


UVEC TNF alpha + IFN gamma 


0.0 


Secondary Thl rest 


9.1 H 


UVEC TNF alpha + IL4 


0.0 
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jSecondary Th2 rest 


3.0 


HUVEC IL-ll 


R.8 


^Secondary Trl rest 


15.0 


Lung Microvascular EC none 


h.6 


ii nmary ini act 

i 




Lung Microvascular EC TNFalpha + IL- 
lbeta 


1 

ft 7 
v. / 


j Primary Th2 act 


11.5 

~~ 


Microvascular Dermal EC none 


0.0 


•Primary Trl act 




Microsvasular Dermal EC TNFalpha + 
IL-lbeta 


ft ft 


Primary Th 1 rest 


2.7 


Bronchial epithelium TNFalpha + 
ILlbeta 


4.1 


Primary Th2 rest 


3.5 


Small airway epithelium none 


0.0 


Primary Trl rest 


12.9 


Small airway epithelium TNFalpha + IL- 
lbeta 


0.0 


CD45RA CD4 lymphocyte act 


4.3 


Coronery artery SMC rest 


0.0 


CD45RO CD4 lymphocyte act 


16.4 


Coronery artery SMC TNFalpha + 1L- 
Ibeta 


0.0 


1CD8 lymphocyte act 


18.9 


Astrocytes rest 


0.0 


jSecondary CD8 lymphocyte rest 


22.5 


Astrocytes TNFalpha + IL-lbeta , 


0.0 


[Secondary CD8 lymphocyte act 


2.8 


KU-812 (Basophil) rest 


4.0 


jCD4 lymphocyte none 


20.2 


KU-812 (Basophil) PMA/ionomycin 


2.0 


|2ry Thl/Th2/Trl_anti-CD95 
|Cn I I 


11.0 


CCD1 106 (Keratinocytes) none 


0.0 


jLAK cells rest 




15.4 


LCD) I Do v«veratinocytes) I Nralpna + 
IL-lbeta 


— — -■ - 

0.0 


•LAK cells IL-2 


28.3 


Liver cirrhosis 


0.0 


|LAK cells IL-2-HL-12 < 


28.7 


NC1-H292 none 


31.2 


jLAK cells IL-2+iFN gamma 


26.4 


NCI-H292 IL-4 


31.6 


[LAK cells IL-2+ IL-18 


24.1 


NCI-H292 IL-9 


41.5 


LAK cells PMA/ionomycin 


9.7 


NCI-H292 IL-13 


15.0 


;NK Cells IL-2 rest 


32.5 


NCI-H292 I FN gamma 


35 8 


jTwo Way MLR 3 day 


20.9 


H PA EC none 


0.7 


jTwo Way MLR 5 day 


6.5 


HPAECTNF alpha + IL-1 beta 


0.0 


ITwo Way MLR 7 day 


20.2 


Lung fibroblast none 


0.0 


;PBMC rest 


10.8 


Lung fibroblast TNF alpha + IL-1 beta j 


0.0 


'PRMP PWM 
{ r ljiviv-, rwivi 


JU.O 


Lung fibroblast IL-4 jo.O 


ipRMP PHA-I 


i n 


Lung fibroblast IL-9 0.0 


.IValllUb iciij nunc 


i i 
i . i 


Lung fibroblast IL-13 |0.0 


RamnQ opin iniirtinvrin 




Lung fibroblast I FN gamma |0.0 


B lymphocytes PWM 


5.3 


Dermal fibroblast CCD1 070 rest JO.O 


B lymphocytes CD40L and IL-4 


3.0 


Dermal fibroblast CCD1070 TNF alpha j 1 .4 


EOL-l dbcAMP 


0.0 


Dermal fibroblast CCD1070 IL-I beta 


1.4 


:EOL-l dbcAMP 
PMA/ionomycin 


0.0 


Dermal fibroblast IFN gamma 


0.0 


Dendritic cells none 


6.1 


Dermal fibroblast IL-4 |0.0 


; Dendritic cells LPS 


0.0 


Dermal Fibroblasts rest 1 1 .5 


iDendritic cells anti-CD40 


0.0 


Neutrophils TNFa+LPS |0.0 
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; • 1 

^Monocytes rest 


jo.o 


Neutrophils rest 


';■> i 


iMonocytes LPS 




0.0 


Colon 


b.o " 


^Macrophages rest 


1.5 


Lung 


6.3" 


Macrophages LPS 


0.0 


Thymus 


52.5 


HUVEC none 


jo.o 


Kidney 


•9.2 


HUVEC starved 


To.6 





General_scrccning_panel_vl.5 Summary: Ag4950 Highest expression of this gene 
is detected in testis (CT=29.3). Therefore, therapeutic modulation of the expression or 
function of this gene may be useful in the treatment of male infertility and hypogonadism. 

5 Moderate to low levels of expression of this gene is also seen in number of cancer cell 

lines derived from pancreatic, gastric, colon, lung, renal, breast, ovarian, prostate, squamous 
cell carcinoma, melanoma and brain cancers. Thus, expression of this gene could be used as a 
marker to detect the presence of these cancers. Furthermore, therapeutic modulation of the 
expression or function of this gene may be effective in the treatment of pancreatic, garyic, 
10 colon, lung, renal, breast, ovarian, prostate, squamous cell carcinoma, melanoma and brain 
cancers. 

Among tissues with metabolic or endocrine function, this gene is expressed at 
moderate to low levels in pancreas, adipose, skeletal muscle, and the gastrointestinal tract. 
Therefore, therapeutic modulation of the activity of this gene may prove useful in the 
1 5 treatment of endocrine/metabolically related diseases, such as obesity and diabetes. 

In addition, this gene is expressed at low levels in all regions of the central nervous 
system examined, including amygdala, hippocampus, substantia nigra, thalamus, cerebellum, 
cerebral cortex, and spinal cord. Therefore, therapeutic modulation of this gene product may 
be useful in the treatment of central nervous system disorders such as Alzheimer's disease, 

20 Parkinson's disease, epilepsy, multiple sclerosis, schizophrenia and depression. 

Interestingly, this gene is expressed at much higher levels in fetal (CT=32) when 
compared to adult lung (CT=40). This observation suggests that expression of this gene can 
be used to distinguish fetal from adult lung. In addition, the relative overexpression of this 
gene in fetal tissue suggests that the protein product may enhance lung growth or 

25 development in the fetus and thus may also act in a regenerative capacity in the adult. 
Therefore, therapeutic modulation of the protein encoded by this gene could be useful in 
treatment of lung related diseases. 
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Panel 4.1D Summary: Ag4950 Highest expression of this gene is detected in 
activated secondary Trl cells (CT=32). In addition, moderate to low levels of expression of 
this gene is also seen in activated polarized T cells, memory T cells, LAK cells, activated 
PBMC, mucoepidermoid NCI-H292 cells, and thymus. Expression of this gene is upregulated 
5 in activated secondary polarized T cells as well as in PBMC cells. Thus, this gene may be 
involved in activation T and B cells. Therefore, modulation of the gene product with a 
functional therapeutic may lead to the alteration of functions associated with these cell types 
and lead to improvement of the symptoms of patients suffering from autoimmune and 
inflammatory diseases such as asthma, allergies, inflammatory bowel disease, lupus 
10 erythematosus, psoriasis, rheumatoid arthritis, and osteoarthritis. 

M. NOV65a: MULTIDOMAIN PRESYNAPTIC CYTOMATRIX PROTEIN 
PICCOLO 

Expression of gene NOV65a was assessed using the primer-probe set Ag495 1, 
described in Table MA. Results of the RTQ-PCR runs are shown in Tables MB and MC. 

15 Table MA. Probe Name Ag495I 



I 

jprimers 


Sequences 


| i 

| Length jStart Position 


iSEQ ID 
INo 


{Forward 


5'-tgcactgaatgcaagaatca-3 ' 


|20 


3133 


38 r 


| Probe 


TET-5'-tctctgtggatttaaccctacaccaca-3'-TAMRA 


[27 


3162 


382 


[Reverse 


5'-agccattcttgaatctcagtca-3' 


j22 


3191 





Table MB. CNS neuroclcgeneration vl.O 



Tissue Name 

I 


Rel. 

Exp.(%) 
Ag4951, 
Run 

249286336 


jRci. 

|Exp.(%) 

Tissue Name jAg4951 , 

jRun 

;249286336 


,AD 1 Hippo 


8.4 


Control (Path) 3 Temporal Ctx j3 .9 


;AD 2 Hippo 


25.7 


Control (Path) 4 Temporal Ctx 57.4 


;AD 3 Hippo 


8.1 


AD 1 Occipital Ctx 18.4 


(AD 4 Hippo 


6~6 


AD 2 Occipital Ctx (Missing) T5.6 


:AD 5 Hippo 


77.9 


AD 3 Occipital Ctx [6.0 


;AD 6 Hippo 


53.2 


AD 4 Occipital Ctx |25.5 


;Control 2 Hippo 


43.2 


AD 5 Occipital Ctx |53.6 


Control 4 Hippo 


3.4 


AD 6 Occipital Ctx ~]24.l 


(Control (Path) 3 Hippo 


3.8 


Control 1 Occipital Ctx j 1 3 


jAD 1 Temporal Ctx 


17.4 


Control 2 Occipital Ctx |42.9 
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IAD 2 Temporal Ctx 


31.4 


Control 3 Occipital Ctx i27.0 


|AD 3 Temporal Ctx 


6.6 


Control 4 Occipital Ctx 13.4 


[AD 4 Temporal Ctx 


22.1 


Control (Path) 1 Occipital Ctx 198.6 


AD 5 Inf Temporal Ctx 


62.4 


Control (Path) 2 Occipital Ctx j 1 7.8 


AD 5 Sup Temporal Ctx 


34.4 


Control (Path) 3 Occipital Ctx ;0.8 


AD 6 Inf Temporal Ctx 


52.1 


Control (Path) 4 Occipital Ctx |3 1 .9 


AD 6 Sup Temporal Ctx 


58.6 


Control 1 Parietal Ctx |4.7 


Control 1 Temporal Ctx 


3.5 


Control 2 Parietal Ctx '36.3 


Control 2 Temporal Ctx [32.8 


Control 3 Parietal Ctx [1 3.3 


Control 3 Temporal Ctx |27.4 


Control (Path) 1 Parietal Ctx |92.0 


[Control 3 Temporal Ctx 


4.0 


Control (Path) 2 Parietal Ctx J28.5 


[Control (Path) 1 Temporal Ctx 


100.0 


Control (Path) 3 Parietal Ctx |2.5 


Control (Path) 2 Temporal Ctx 


54.3 


Control (Path) 4 Parietal Ctx {65.5 



Table MC. Panel 4. ID 



Tissue Name 


Rel. 

Exp.(%) 
Ag4951, 
Run 

223626818 


Tissue Name 


Rel. 

|Exp.(%) 
!Ag4951, 
IRun 

i223626818 


Secondary Th 1 act 


0.0 


HUVEC IL-lbeta 


;r 3 .5 


Secondary Th2 act 


0.0 


HUVEC IFN gamma 


i0.4 


Secondary Trl act 


0.0 


HUVEC TNF alpha + IFN gamma ;0.0 


Secondary Th 1 rest 


0.0 


HUVEC TNF alpha + IL4 


j0.9 


Secondary Th2 rest 


0.0 


HUVEC IL-11 


jl.6 


Secondary Trl rest 


0.0 


Lung Microvascular EC none 


:9.l 

r 

|2.3 


Primary Thl act 


0.0 


Lung Microvascular EC TNFalpha + IL- 
Ibeta 


Primary Th2 act 


1.2 


Microvascular Dermal EC none 


11.6 


Primary Trl act 


0.0 


Microsvasular Dermal EC TNFalpha + !_ - 
IL-lbeta j 


: Primary Thl rest 


0.0 


Bronchial epithelium TNFalpha + 
1 LI beta 


r 

|40.3 


; Primary Th2 rest 


0.0 


Small airway epithelium none 


j3.3 " 


[primary Trl rest 


0.0 


Small airway epithelium TNFalpha + IL- 
Ibeta 




CD45RA CD4 lymphocyte act 


0.0 


Coronery artery SMC rest 


]0.7 


CD45RO CD4 lymphocyte act 


0.0 


Coronery artery SMC TNFalpha + IL- 
lbeta 


|0.8 


CD8 lymphocyte act 


0.0 


Astrocytes rest 


ir.7 


[Secondary CD8 lymphocyte rest 


0.0 


Astrocytes TNFalpha + IL-lbeta |2.1 


[Secondary CD8 lymphocyte act 


0.0 


KU-812 (Basophil) rest 


(0.0 


CD4 lymphocyte none 


0.0 


KU-812 (Basophil) PMA/ionomycin 


~[o76~ 



504 



WO 03/023002 




PCT/US02/28539 



|2ry Thl/Th2/Trl anti-CD95 
iru i i 


0.0 


CCDI 106 (Keratinocytes) none 


44.4 


jLAK cells rest 


0.0 


CCDl 106 (KeraiinnovfpO TNFnlnha + : 
IL-lbeta " j 366 


jLAK cells IL-2 


0.0 


Liver cirrhosis 125.0 

i 


jLAK cells IL-2+IL-12 


0.0 


NCI-H292 none jl4.4 


jLAK cells 1L-2+IFN gamma 


0.0 


NCI-H292 IL-4 ]18.7 


jLAK cells IL-2+ IL-18 


0.0 


NCI-H292 IL-9 J39.0 


iLAK cells PMA/ionomycin 


0.0 
0.0 


NCI-H292 IL-13 126.8 


INK Cells IL-2 rest 


NCI-H292IFN gamma |23.3 


Two Wav MLR 3 dav 


0.0 


HPAEC none |0.0 


Two Wav MLR 5 dav 


0.0 


HPAEC TNF alpha + 1L-1 beta |0.0 


Two Way MLR 7 day 


0.4 


Lung fibroblast none |0.0 


iPBMC rest 


0.0 


Lung fibroblast TNF alpha + 1L-1 beta j0.7 « 


^DDVyl/^ D\X/N>f 
; rfc>lVH^ r Wm 


0.0 


Lung fibroblast IL-4 ;0.0 


IpRN/f/^ 1 DU A I 

jrDlVIC rnA-L 


u.u 


Lung fibroblast IL-9 ;0.0 


■Ramos (B cell) none 


42.3 


Lung fibroblast IL-13 


0.0 


'Ramos (B cell) iononiycin 


47.0 


Lung fibroblast IFN gamma 


0.0 


|B lymphocytes PWM 


0.0 


Dermal fibroblast CCD 1 070 rest 


4.6 


its lymphocytes CD4UL and IL-4 


0.0 


Dermal fibroblast CCD 1070 TNF alpha . 


1.0 


jbUL-l doCAIVIr 


1 

0.0 


Dermal fibroblast CCD 1070 IL-1 beta 


0.0 


jhUL-l docAMr 
|PMA/ionomycin 


0.0 


Dermal fibroblast IFN gamma 


0.8 


(Dendritic cells none 


o.o 1 


Dermal fibroblast IL-4 


0.0 


[Dendritic cells LPS 


0.0 


Dermal Fibroblasts rest iO.O 


iDendritic cells anti-CD40 


0.0 


Neutrophils TNFa+LPS |0.0 


• Monocytes rest 


0.0 


Neutrophils rest |0.0 


Monocytes LPS 


0.0 


Colon -5.7 


^Macrophages rest 


0.8 


Lung j!7.8 


Macrophages LPS 


0.0 


Thymus |38.2 


1HUVEC none 


2.2 


Kidney 100.0 


jHUVEC starved 


2.5 





CNS_neurodegeneration_vl.O Summary: Ag495 1 Expression of this gene is 
ubiquitous throughout the samples in this panel, with highest expression in the temporal 
cortex of a control patient with pathological condition (CT=26.4). While no association 
5 between the expression of this gene and the presence of Alzheimer's disease is detected in 
this panel, these results confirm the expression of this gene in areas that degenerate in 
Alzheimer's disease, including the cortex, hippocampus, amygdala and thalamus. Expression 
of this gene in brain suggests that this gene may play a role in central nervous system 
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disorders such as Alzheimer's disease, Parkinson's disease, epilepsy, multiple sclerosis, 
schizophrenia and depression. 

Panel 4. ID Summary: Ag4951 Highest expression of this gene is detected in kidney 
(CT=30.I). In addition, moderate to low levels of expression of this gene is also seen in 
5 colon, lung, thymus, Ramos B cells, lung microvascular endothelial cells, cytokine activated 
small airway epithelium, keratinocytes, mucoepidermoid NCI-H292 cells and liver cirrhosis 
samples. Therefore, therapeutic modulation of this gene may be useful in the treatment of 
autoimmune and inflammatory diseases such as asthma, allergies, inflammatory bowel 
disease, lupus erythematosus, psoriasis, rheumatoid arthritis, osteoarthritis and liver cirrhosis. 

10 N. NOV66a and NOV66b: human ortholog of rat CYTOSOLIC SORTING PROTEIN 
PACS-1A 

Expression of gene NOV66a and variant NOV66b was assessed using the primer- 
probe sets Ag4956 and Ag4960, described in Tables NA and NB. Results of the RTQ-PCR 
runs are shown in Tables NC and ND. 

15 Table NA, Probe Name Ag4956 



Iprimcrs 

1 ... 


Sequences 


! 

Length jStart Position 


SEQ ID 
No 


jForward j5'-gaagatctccggaaagtgaaga-3' 


22 ~]923 


384 


jProbe jTET-S'^ccggaggaaactaacctcaacctct^'-TAMRA 


26 ;948 |385 


•Reverse 


S'-gatgttaggttgccttgtgatgo 


22 :976 |386 


Tabic NB. Probe Name Ag4960 


r 

jPrimers 


Sequences 


Length 


Start Position 


SEQ ID 
No 


jForward 


5-gaagatctccggaaagtgaaga-3' 


22 


923 


387 


jProbe 


TET-S'-cccggaggaaactaacctcaacctcto'-TAMRA 


26 


948 


388 "' 


jReverse 


5'-gatgttaggttgccttgtgatg-3' 


22 


976 


389 



20 

Table NC. General screening panel vl.5 



t 

i 

i 
i 

Tissue Name 

i 
1 


Rel. 

Exp.(%) 
Ag4956, 
Run 

228886975 


iRel. 

|Exp.(%) 
;Ag4960, 
•Run 

228886991 


Tissue Name 


Rcl. 

Exp.(%) 
Ag4956, 
Run 

228886975 


Rel. 

Exp.(%) 
Ag4960, 
Run 

228886991 


i'Adipose 


15.1 


!2.2 

+ 


Renal ca. TK-10 


26.6 


27.7 


i 

Melanoma* 

;Hs688(A).T 


79.6 


184.1 

•i 


Bladder 


30.4 


2.5 
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Melanoma* 
Hs688(B).T 


j 

84.7 


* 

81.8 


Gastric ca. (liver met.) 
NCI-N87 


56.6 


45.1 


Melanoma* MI4 


32.3 


31.2 


Gastric ca. KATO III 


44.4 


41.5 


^Melanoma* 
;LOXIMVI 


45.4 


40.6 


Colon ca. SW-948 


4.7 


4.5 


Melanoma* SK- i ylA/: 
,MEL-5 ] 


1 

46.0 IColon ca. SW480 

! 


HO.V 


HJ.J 


.Squamous cell ^ 7 
jcarcinoma SCC-4 j 


32.5 


Colon ca * (SW480 met) 
SW620 


1 o.z. 


17 7 
i / . / 


^Testis Pool 123.2 


26.2 -Colon ca. HT29 


8.4 


8.0 


iProstate ca.* (bone L ? R 
!met) PC-3 j 


34.2 


Colon ca. HCT-116 


41.8 


38.4 


! Prostate Pool 


9.9 


10.4 


Colon ca. CaCo-2 


35.1 


30.6 


> r - ... ~.mmm**«. — ~- 

jPlacenta 


20.2 


17.2 


Colon cancer tissue 


16.5 


15.3 


[Uterus Pool 


12.0 


13.2 jColonca. SW1116 


8.4 


7.4 


: Ovarian ca. OVCAR- 

; 3 


33.7 

.._ * 


I 

33.0 IColon ca. Colo-205 

. i . 


1.8 


..5 


Ovarian ca. SK-OV-3 |67.8 !63.3 IColon ca. SW-48 


2.3 jl.8 


jOvarian ca. OVCAR- j„ 

L- £! 


29.9 


Colon Pool 


22.4 


22.4 


Ovarian ca. OVCAR- L_ . 

\s f A 


38.7 


Small Intestine Pool 


12.9 


10.7 


jOvarian ca. iGROV-1 ;18.7 


14.5 


Stomach Pool 


9.7 


9.9 


[Ovarian ca. OVCAR- 1 . _ - j . . 0 


Bone Marrow Pool 


9.0 


6.7 


IOvary |22.8 |20.~6 


Fetal Heart 


19.2 


19.5 


Breast ca. MCF-7 


42.3 142.6 


Heart Pool 


11.2 


9.4 


! Breast ca. MDA-MB- 
;231 


97.3 ilOO.O 


Lymph Node Pool 


24.8 


20.9 


;Breastca. BT549 |59.9 


58.6 


Fetal Skeletal Muscle 


6.8 


7.6 


iBreast ca. T47D j!2.2 


15.9 (Skeletal Muscle Pool 

_ . . . . . .i . . 


13.8 


13.8" 


IBreast ca. MDA-N |l2.4 ]14.9 jSpleen Pool 


11.8 


10.7 


Breast Pool 


23.3 (21.2 |ThymusPool 


17.3 


17.2 


Trachea 


33.0 


id i \CNS cancer (glio/astro) 
IU87-MG 


62.9 


56.3 


Lung 


4.9 


4.7 


CNS cancer (glio/astro) 
U-I18-MG 


74.7 


74.7 


r . 

Fetal Lung |30.8 


31.9 


CNS cancer (neuro;met) 
SK-N-AS 


69.7 


66.4 


:Lungca.NCI-N417 

i 


5.6 


5.0 


CNS cancer (astro) SF- 
539 


20.0 


20.6 


^Lungca. LX-I p0.4 


20.0 


CNS cancer (astro) SNB- 
75 


79.0 


71.2 


'Lung ca. NCI-HI 46 |6.9 : 


7.5 


CNS cancer (glio) SNB- 
19 


15.0 


13.3 


jEungca.SHP-77 ! 1 7.9 


15.6 


CNS cancer (glio) SF-295 


43.5 


43.5 



507 



WO 03/023002 




PCT/US02/28539 



'Lung ca. A549 


181.2 


78o 


|Brain (Amygdala) Pool 1 1 4.2 


8.0 


,Lung ca. NCI-H526 




9.0 


•Brain (cerebellum) 


45.4 


29.9 


:Lungca. NCI-H23 




19.5 


24.5 


;Brain (fetal) j 100.0 


77.4 


^Lung ca. NCI-H460 




11.5 


12.6 


iBrain (Hippocampus) 
! Pool 


15.6 


17.0 


• LUllg La. 1 IUi"Oi 


|26.2 


LL.J 


icereDrai cortex rooi 

i . ,, 


23.0 


1 R A 


i 

! Lung ca. NCI-H522 


1 

!38.4 

! 


39.2 


;Brain (Substantia nigra) 
iPool 


16.6 


12.7 


Liver 


"]l.9 ~^ 


■ 

1.5 


jBrain (Thalamus) Pool 


25.5 


20.2 


JFetal Liver 




12.7 


12.2 


jBrain (whole) 


48.3 


3L2 


jLiver ca. HepG2 


113.2 


12.9 


{Spinal Cord Pool 


11.0 


9.5 1 


jKidney Pool 


i25.5 


25.5 


! Adrenal Gland 


17.3 


12.9 


jFetal Kidney 




11.0 


8.7 


jPituitary gland Pool 


4.1 


3.9 


[Renal ca. 786-0 


|35.6 


34.9 


jSalivary Gland 


57.0 


42.0 


! Renal ca. A498 




10.2 


8.7 


jThyroid (female) 


10.3 


11.2 


; Renal ca. ACHN 


|37.1 


39.2 


■Pancreatic ca. CAPAN2 


35.6 


36.3 


;Renal ca. UO-31 


i 


86.5 |79.0 


iPancreas Pool 


21.0 J21.6 



Table ND. Panel 4.1D 



1 

1 

iTissue Name 

i 

\ 


Rel. 

iExp.(%) 
Ug4956, 
iRun 

|223629803 


Rel. 

Exp.(%) 
Ag4960, 
Run 

223629875 


Tissue Name 


Rel. 

Exp.(%) 
Ag4956, 
Run 

223629803 


Rel. 

Exp.(%) 
•Ag4960, 
Run 

223629875 


jSecondary Th 1 act 


39.0 


39.2 


jHUVEC lL-lbeta 


32.1 


32.5 


'Secondary Th2 act 


|42.0 


31.4 


|l-IUVEC IFN gamma 


38.2 


i40.1 


i : 
'Secondary Trl act 128.9 

i 


10.0 


HUVEC TNF alpha + 
IFN gamma 


27.9 


25.2 


jSecondary Thl rest 

! 


! 

|36.1 


24.3 


HUVEC TNF alpha + 
IL4 


26.8 


'25.9 


'Secondary Th2 rest 


55.1 


26.6 


HUVEC IL-1 1 


24.7 


"24.8 


'Secondary Trl rest 


i ' 
61.6 


16.4 


Lung Microvascular EC 
none 


67.8 


69.7 


l 

'Primary Thl act 


27.5 


.... 
14.3 


Lung Microvascular EC 
TNFalpha+ IL-1 beta 


51.8 




jPrimary Th2 act 


82.4 


45.4 


Microvascular Dermal 
EC none 


43.5 


45.1 


I 

1 Primary Trl act 

; 


56.3 


24.5 


Microsvasular Dermal 
EC TNFalpha + IL- 
lbeta 


38.7 


40.9 


.Primary Thl rest 


48.0 


31.4 


Bronchial epithelium 
TNFalpha + ILI beta 


32.8 


35.1 


Jprimary Th2 rest 


74.2 


49.3 


Small airway 
epithelium none 


i 

15.6 


15.6 
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jPrimary Trl rest 
i 


73.7 


r 

41.2 


Small airway 
epithelium TNFalpha + 
IL-lbeta 


24.5 


j 

|26.6 

* 


1CD45RA CD4 
jlymphocyte act 


49.3 


50.0 


Coronery artery SMC 
rest 


63.3 


65.1 

j 


j"CD45ROCD4 \ m g 
jlymphocyte act j 


83.5 


Coronery artery SMC 
TNFalpha* IL-lbeta 


55.9 


66.0 


CD8 lymphocyte act |44.4 


51.1 'Astrocytes rest 


28.9 


39.8 


Secondary CD8 
lymphocyte rest 


52.1 


54.0 


Astrocytes TNFalpha + 
IL-lbeta 


31.4 


39.5 


Secondary CD8 
lymphocyte act 


30.1 


32.1 


KU-812 CBasoDhiH rest 


16.3 


19.6 


CD4 lymphocyte none 


38.4 


47.6 


KU-812 (Basophil) 
PMA/ionomycin 


34.2 


38.7 


2ryThl/Th2/Trl anti- 
CD95CH11 


90.8 


92.7 


CCD1106 

(Keratinocytes) none 


33.9 


36.9 


LAK cells rest 


37.9 


35.1 


CCD1106 
(Keratinocytes) 

TNFalnha + 11-1 h^tn 

1 IN r al pi la i I L* 1 L/Clu 


42.0 143.2 

j- - ~J 


LAK cells 1L-2 \S2.9 


51.4 


1 iv/pr rirrlirtcl^ 

l_/IVvl V^llillVJ^Io 


14.9 116.2 


LAKcellsIL-2+lL-12 


27.0 


25.2 


NCI-H292 none 


30.1 129.1 


LAK cells IL-2+IFN 
gamma 


40.6 


43.2 


NCI-H292 1L-4 


47.6 [50.0 


LAK cells IL-2+ IL-18 


48.3 145.4 


NCI-H292 IL-9 


61.6 


58.6 


LAK cells 
]PMA/ionomycin 


16.8 !l5.l 

i 


NCI-H292IL-13 


i 

59.0 ]53.2 


|NK Cells IL-2 rest 65.5 |59.5 


NCI-H292 I FN gamma 


53.6 i67.4 


•Two Way MLR 3 day !35.6 |38.2 


HPAEC none 


26.4 '24.5 


i ! i 
jTvvo Way MLR 5 day |28.9 j32.3 


HPAEC TNFalpha + 
IL-1 beta 


j " ' i 

33.2 133.7 


(Two Way MLR 7 day j36.3 j 


45.4 
42.0 


Lung fibroblast none 


45.1 


47.3 


!pBMC rest 

i 


1 

34.2 


Lung fibroblast TNF 
alpha + IL-1 beta 


26.6 


30.6 


iPBMCPWM |31.0 ;26.2 


Lung fibroblast IL-4 


46.7 


37.1 


PBMC PHA-L 156.6 


50.0 


Lung fibroblast IL-9 


47.3 j50.3 


•Ramos (B cell) none 


8.1 18.I 


Lung fibroblast IL-13 


33.7 }41 .5 


:Ramos(Bcell) 
'ionomycin 


8.8 ]8.2 


Lung fibroblast IFN 
gamma 


53.6 


57.8 


|b lymphocytes PWM 


34.6 138.7 

1 


Dermal fibroblast 
CCD 1070 rest 


54.3 


50.0 


;B lymphocytes CD40L 
5 and IL-4 


63.7 '|6I.6 


Dermal fibroblast 
CCD1070 TNF alpha 


100.0 


100.0 

45.4 


: EOL-l dbcAMP 


14.6 


14.1 


Dermal fibroblast 
CCD1070IL-1 beta 


35.6 


:EOL-l dbcAMP 
PMA/ionomycin 


18.6 


20.7 


Dermal fibroblast IFN 
gamma 


38.4 


32.1 


Dendritic cells none 


53.""""""" 


46.™ " 


Dermal fibroblast IL-4 


51.1 


52.9 
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jUciiunitc ecus Lro 


** 1 0 

J 1 


T">8 0 


iL'eriiidi riorouiasis rest 


J / .o 


4 1 .3 


[Dendritic cells anti- 
;CD40 


43.5 


j57.8 


[Neutrophils TNFa+LPS 
i 


25.7 


2,7 


'Monocytes rest 


39.2 


|34.2 


^Neutrophils rest 


64.6 


61.1 


Monocytes LPS 


9.2 


1 1 3.0 


{Colon 


9.2 


6.8 


Macrophages rest 


27.2 


|42.0 


jLung 


27.9 


24.8 


Macrophages LPS 


12.9 


j 1 0.0 


:Thymus 


32.5 


31.9 


iHUVEC none 


18.6 


jl9.9 


•Kidney 


16.6 1 


16.4 


HUVEC starved 


27.9 


|30.8 


i 







Gcneral_jscreeningjpancl_vl.5 Summary: Ag4956/Ag4960 Two experiments with 
same probe and primer sets are in excellent agreement with highest expression of this gene 
seen in fetal brain and breast cancer MDA-MB-231 cell line (CTs=25). Moderate to high 
5 levels of expression of this gene is also seen in cluster of cancer cell lines derived from 
pancreatic, gastric, colon, lung, liver, renal, breast, ovarian, prostate, squamous cell 
carcinoma, melanoma and brain cancers. Thus, expression of this gene could be used as a 
marker to detect the presence of these cancers. Furthermore, therapeutic modulation of the 
expression or function of this gene may be effective in the treatment of pancreatic, gastric, 
10 colon, lung, liver, renal, breast, ovarian, prostate, squamous cell carcinoma, melanoma and 
brain cancers. 

Among tissues with metabolic or endocrine function, this gene is expressed at high to 
moderate levels in pancreas, adipose, adrenal gland, thyroid, pituitary gland, skeletal muscle, 
heart, liver and the gastrointestinal tract. Therefore, therapeutic modulation of the activity of 
1 5 this gene may prove useful in the treatment of endocrine/metabolically related diseases, such 
as obesity and diabetes. 

In addition, this gene is expressed at high levels in all regions of the central nervous 
system examined, including amygdala, hippocampus, substantia nigra, thalamus, cerebellum, 
cerebral cortex, and spinal cord. Therefore, therapeutic modulation of this gene product may 
20 be useful in the treatment of central nervous system disorders such as Alzheimer's disease, 
Parkinson's disease, epilepsy, multiple sclerosis, schizophrenia and depression. 

Panel 4.1D Summary: Ag4956/Ag4960 Two experiments with same probe and 
primer sets are in excellent agreement with highest expression of this gene seen in TNF alpha 
treated dermal fibroblast (CTs=27-27.5). This gene is expressed at high to moderate levels in 
25 a wide range of cell types of significance in the immune response in health and disease. 
These cells include members of the T-cell, B-cell, endothelial cell, macrophage/monocyte, 
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and peripheral blood mononuclear cell family, as well as epithelial and fibroblast cell types 
from lung and skin, and normal tissues represented by colon, lung, thymus and kidney. This 
ubiquitous pattern of expression suggests that this gene product may be involved in 
homeostatic processes for these and other cell types and tissues. This pattern is in agreement 
5 with the expression profile in General_screening_panel_vl .5 and also suggests a role for the 
gene product in cell survival and proliferation. Therefore, modulation of the gene product 
with a functional therapeutic may lead to the alteration of functions associated with these cell 
types and lead to improvement of the symptoms of patients suffering from autoimmune and 
inflammatory diseases such as asthma, allergies, inflammatory bowel disease, lupus 
1 0 erythematosus, psoriasis, rheumatoid arthritis, and osteoarthritis. 

O. NOV67a: Formin 2 

Expression of gene NOV67a was assessed using the primer-probe set Ag4959, 

described in Table OA. Results of the RTQ-PCR runs are shown in Tables OB and OC. 
Table OA. Probe Name Ag4959 



Primers 


Sequences 


Length jStart Position 


seqieT 

No 


Forward 
Probe 


5Mgcattgtgatggttttgtatg-3' 


22 ]5198 


390 
391 


TET-5'-tccattacttttaatgatcttcgttgca-3'-TAMRA ]28 ;5235 


Reverse 


S'-aaauctcccctttatccacaa^' 


22 15271 


392 



Table OB. CNS neurodegeneration vl.O 





Rel. 






Rel. 


Tissue Name 


Exp.(%) 
Ag4959, 
Run 


Tissue Name 




!Exp.(%) 
Ag49S9, 
Run 




224735164 




J224735164 


AD 1 Hippo 


2.9 


Control (Path) 3 Temporal Ctx 




|4.7 


AD 2 Hippo 


3.9 


Control (Path) 4 Temporal Ctx 


|19.9 


AD 3 Hippo 


1.1 


AD 1 Occipital Ctx 






|AD4 Hippo 


2.7 


AD 2 Occipital Ctx (Missing) 


— 


0.0 


[AD 5 hippo 


46.6 


AD 3 Occipital Ctx 




1.2 


'AD 6 Hippo 


100,0 


AD 4 Occipital Ctx 




4.4 


jControl 2 Hippo 


6.4 


AD 5 Occipital Ctx 


|9.9 


jControl 4 Hippo 


2.5 


AD 6 Occipital Ctx 




15.7 


Control (Path) 3 Hippo 


1.2 


Control 1 Occipital Ctx 





b.o 


AD 1 Temporal Ctx 


5.7 


Control 2 Occipital Ctx 


52.1 


AD 2 Temporal Ctx 


7.4 


Control 3 Occipital Ctx 


5.5 


AD 3 Temporal Ctx 


0.0 


Control 4 Occipital Ctx 


1.6 
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|AD 4 Temporal Ctx 




(3.5 




'Control rPath^ 1 Orrinital Ctx 

■ V* Ul III \Ji alii J 1 WL.^, 1 JJ 1 lei 1 VylA 


i4l S 


jAD 5 Inf Temporal Ctx 


48.0 




iPnntml fPntl'A 9 Orrmitsil Ptv 

[V-UIIULM cUIiJ L. WC^ipildl V^IA. 


7 9 — 


jAD 5 SupTemporal Ctx 


17.6 






in n 

;O.U 


jAD 6 Inf Temporal Ctx 


27.7 




jControl (Path) 4 Occipital Ctx 


(5.8 


|AD 6 Sup Temporal Ctx 


26.6 




'Control 1 Parietal Ctx 


•1.4 


Control 1 Temporal Ctx 






1.0 




(Control 2 Parietal Ctx 


1 1 7.6 


Control 2 Temporal Ctx 






19.2 




jControl 3 Parietal Ctx 


[4.6 


Control 3 Temporal Ctx 


5.7 




jControl (Path) 1 Parietal Ctx 


146.0 


Control 4 Temporal Ctx 


2.0 




jControl (Path) 2 Parietal Ctx 


]4.8 


Control (Path) 1 Temporal Ctx 


22.5 " 




iControl (Path) 3 Parietal Ctx 


2.2 


Control (Path) 2 Temporal Ctx 




J25.9 




jControl (Path) 4 Parietal Ctx 


123.5 


Table OC. General screening panel vl.5 


i 

i 
j 

s 

jTissue Name 

1 

1 




iRcl. 

iExp.(%) 
|Ag4959, 
jRun 

{228886990 


i 

I 

1 

Tissue Name 

1 


Rcl. 

Exp.(%) 
Ag4959, 
Run 

228886990 


{Adipose 




jo.o 




Renal ca. TK-10 


io.o 


iMelanoma* Hs688(A).T |2.3 


"Bladder 


0.0 


jMelanoma* Hs688(B).T 




I" 




Gastric ca. (liver met.) NCI-N87 


|0.0 


(Melanoma* M14 




|0.0 




Gastric ca. KATO III 


0.0 


(Melanoma* LOXIMVI 




15.0 


(Colon ca. SW-948 


p 


Melanoma* SK-MEL-5 


100.0 


IColon ca. SW480 


0.0 


(Squamous cell carcinoma SCC-4 


0.0 


|Colon ca.* (SW480 met) S W620 


0.0 


(Testis Pool 


o.o 


jColonca. HT29 ;0.0 " 


jProstate ca.* (bone met) PC-3 


0.0 


iColon ca. HCT-116 


0.0 


iProstate Pool 


0.0 


Colon ca. CaCo-2 


0.0 


jpiacenta 


0.0 


Colon cancer tissue jO.O 


iDterus Pool 


0.0 


jColonca. SWI116 


0.0 


lOvarian ca. OVCAR-3 


0.0 


iColon ca. Colo-205 jO.O 


iOvarian ca. SK-OV-3 


0.0 


jColon ca. SW-48 iO.O 


^Ovarian ca. OVCAR-4 


0.0 


]Colon Pool [0.6 


^Ovarian ca. OVCAR-5 


0.0 


■Small Intestine Pool jO.O 


jOvarian ca. IGROV-1 


i 


0.0 


IStomach Pool jO.O 


iOvarian ca. OVCAR-8 


oa " 




Bone Marrow Pool 


0.0 


jOvary 


0.0 


~i 

i 
1 


Fetal Heart iO.O 


(Breast ca. MCF-7 


jo.o 


N 

! 


Heart Pool jO.O 


'Breast ca. MDA-MB-23 1 


0.0 


Lymph Node Pool iO.O 


^Breast ca. BT 549 


0.0 


iFetal Skeletal Muscle l(j.O 


[Breast ca. T47D 


0.0 


jSkeletal Muscle Pool 


0.0 


Breast ca. MDA-N 


0.6 


[Spleen Pool j0.0 


Breast Pool 


0.0 


(Thymus Pool 


0.0 
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, 

Trachea 




;0.0 


IN o CdnCCI ^gMU/abuU^ U0/-IVIVJ 




jLung 




jO.O 


v^iNo cancer ^gno/asiro; u-i lo-iviu 


1 3 


jFetal Lung 




[o.o 


CNS cancer (neuro;met) SK-N-AS 


o.o""" 1 


■Lung ca. NC1-N4I7 




0.0 


CNS cancer (astro) SF-539 


0.0 


j ... ■ 

jLung ca. LX-1 




7 0.0 


CNS cancer (astro) SNB-75 


0.0 


[Lung ca. NCI-HI 46 




o.o" " 


CNS cancer (glio) SNB-19 


0.0 


Lung ca. SHP-77 


!0.4 


CNS cancer (glio) SF-295 


0.0 


Lung ca. A549 




0.0 


Brain (Amygdala) Pool 


0.8 


Lungca.NCI-H526 [0.0 


Brain (cerebellum) 


4.7 


Lungca. NCI-H23 


[o.o 


Brain (fetal) 


1.2 


Lung ca. NCI-H460 


0.7 


Brain (Hippocampus) Pool 11.2 


Lung ca. HOP-62 




0.0 


Cerebral Cortex Pool 12.3 


Lung ca. NCI-H522 


0.0 


Brain (Substantia nigra) Pool 


1.0 


iLiver 

i. 1M , , . „ 




r 0.0 


Brain (Thalamus) Pool (2.3 


JFetal Liver 





0.0 


Brain (whole) 


2.3 j 


jLiver ca. HepG2 


0.0 


Spinal Cord Pool 


0.8~" ~" 


iKidney Pool 


0.0 j Adrenal Gland jO.O j 


jFetal Kidney 


0.0 


Pituitary gland Pool 


0.4 


jRenal ca. 786-0 


0.0 jSalivary Gland 


0.0 


{Renal ca. A498 


0.0 (Thyroid (female) |0.0 


•Renal ca. ACHN 




0.0 


Pancreatic ca. CAPAN2 


0.0 


iRenal ca. UO-31 


;o.o j 


Pancreas Pool 


0.0 



CNS_neurodegencration_vl.O Summary: Ag4959 Low levels of expression of this 
gene is seen throughout the samples in this panel, with highest expression in the hippocampus 
of a patient with Alzheimer's disease (CT=33.3). While no association between the 
5 expression of this gene and the presence of Alzheimer's disease is detected in this panel, these 
results confirm the expression of this gene in areas that degenerate in Alzheimer's disease, 
including the cortex, hippocampus, amygdala and thalamus. 

General_screening_pancl_v1.5 Summary: Ag4959 Moderate levels of expression 
of this gene are restricted to melanoma SK-MEL-5 cell line (CT=3 1 .7). Therefore, expression 
10 of this gene may be used to distinguish this sample from other samples used in this panel and 
also as marker to detect the presence of melanoma. Furthermore, therapeutic modulation of 
this gene may be useful in the treatment of melanoma. 

P. NOV69a: F-bo\ domain containing protein 

Expression of gene NOV69a was assessed using the primer-probe set Ag496I, 

1 5 described in Table PA. Results of the RTQ-PCR runs are shown in Tables PB, PC and PD. 
Table PA. Probe Name Ag4961 



513 



WO 03/023002 




PCT/US02/28539 



! 

Primers 

'Forward 


Sequences 


Length jStart Position |^ ,D 


5'-cagaggtgccctttagcttact-3' 


22 =1671 393 


;Probe 


TET-S'-ttctcagcgtagattttgtccatcaa^'-TA M RA 


26 |1706 


394 


[Reverse 


5'-caaatggcggtcatgtataatc-3' 


22 11745 


395 



Table PB. CNS neurodegeneration vl.O 



5 



I 

Tissue Name 


Rel. 

Exp.(%) 

Ag-f^OI, 

Run 

224735209 


Tissue Name 


Rel. 

Exp.(%) 
Ag4961, 
Run 

224735209 


AD 1 Hippo 




jl4.6 


Control (Path) 3 Temporal Ctx 17.4 


jAD 2 Hippo 




133.9 


Control (Path) 4 Temporal Ctx |32.1 


jAD 3 Hippo 




10.6 


AD 1 Occipital Ctx 


] 18.0 


|AD4 Hippo 




4.9_ 


AD 2 Occipital Ctx (Missing) 


[0.0 


|AD5 Hippo 




[l6o.O 


AD 3 Occipital Ctx 


112.7 ^ 


;AD 6 Hippo 




|77.9 


AD 4 Occipital Ctx 


|23.2 


iControl 2 Hippo 


24.1 


AD 5 Occipital Ctx 


32.8 


;Control 4 Hippo 


17.9 


AD 6 Occipital Ctx 


j34.9 


jControl (Path) 3 Hippo 


11.5 


Control 1 Occipital Ctx 


8.4 


|AD 1 Temporal Ctx 


27.0 


Control 2 Occipital Ctx -138.4 


jAD 2 Temporal Ctx 


39.0 


Control 3 Occipital Ctx 


19.8 


!AD 3 Temporal Ctx 


8.1 


Control 4 Occipital Ctx 


9.5 


! AD 4 Temporal Ctx 


25.5 


Control (Path) 1 Occipital Ctx 


84.7 


;AD5 Inf Temporal Ctx 


88.9 


Control (Path) 2 Occipital Ctx 


12.9 


:AD 5 Sup Temporal Ctx 


43.2 


Control (Path) 3 Occipital Ctx -6.2 


|AD 6 Inf Temporal Ctx 


57.0 


Control (Path) 4 Occipital Ctx 


17.0 


jAD 6 Sup Temporal Ctx 


80.7 


Control 1 Parietal Ctx ]9.6 


iControl 1 Temporal Ctx 


11.7 


Control 2 Parietal Ctx 


52.5 


Control 2 Temporal Ctx 


36.1 


Control 3 Parietal Ctx 


15-1 


Control 3 Temporal Ctx 


13.6 (Control (Path) 1 Parietal Ctx [66.9 


^Control 3 Temporal Ctx 


13.0 


Control (Path) 2 Parietal Ctx |25.7 


Control (Path) 1 Temporal Ctx 


54.7 (Control (Path) 3 Parietal Ctx 


6.2 


Control (Path) 2 Temporal Ctx 


27.5 


Control ( Path) 4 Parietal Ctx {44. 1 


Table PC. General screening panel vl.5 


Tissue Name 


Rel. 

Exp.(%) 
Ag4961, 1 
Run 

228903662 


. ... ..... 

'issue Name 


Rcl. 

Exp.(%) 
Ag496I, 
Run 

228903662 
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[Adipose 


7 A 


jRenal ca. TK-IO 




zu.u 


■Melanoma hsooo(A). i 


A(\ A 
41). 0 


jBladder 




! 1 2.o 


ijvieianonia' Hsooo(ts). I 


\ac\ o 

[4U.y 


jGastric ca. (liver met.) NCI-N87 




12.6 


jMelanoma* M 1 4 


1 1 f\ 1 


jGastricca. KATOIII 




io.o 


[Melanoma* loaiivivi 


'7fl 7 


jColon ca. SW-948 




5.6 


{Melanoma* bK.-MtLo 


1 Q 7 
I 0./ 


[Colon ca. SW480 




23.3 


jSquamous cell carcinoma SCC-4 




Colon ca.* (SW480 met) SW620 


'293" ~ 


I Test is Pool 


lie 


Colon ca. HT29 


5.3 


Prostate ca.* (bone met) PC-3 


OC A 

25.0 


Colon ca. HCT-116 




50.0 


; Prostate Pool 


1 7 7 
12.3 


Colon ca. CaCo-2 




29.7 


jPIacenta 


1 A. 

I .9 


Colon cancer tissue 


|8.8 


{Uterus Pool 

a-. — -•>■ — — • ~ • — ^ ■-=■-■ 


15.1 


Colon ca. SW1116 


2.7 


Ovarian ca. OVCAR-3 


13.7 


Colon ca. Colo-205 




0.9 


Ovarian ca. oK-UV-j 


54.3 


Colon ca. SW-48 




5.6 


{Ovarian ca. OVLAK-4 


4.3 


Colon Pool 




28.7 


lOvanan ca. OVLAK-o 


1 C A 

15.0 


Small Intestine Pool 




21.2 


[Ovarian ca. IGKOV-1 


OA /I 

20.4 


Stomach Pool 




9.9 


(Ovarian ca. OVCAK-8 


1 7 A 

12.9 


Bone Marrow Pool 


6.1 


Ovary 


1 1 A 

1 1 .9 


Fetal Heart 




7.2 


.Breast ca. MCh-7 


16.3 


Heart Pool 


1 1.8 


jBreast ca. ML)A-MB-2j 1 


24.1 


Lymph Node Pool 


27.4 


i 1,1 " 1 — — — • - - -• 
(Breast ca. B 1 549 


60.3 


Fetal Skeletal Muscle 


4.5 


•Breast ca. 1 47D 


2.9 


Skeletal Muscle Pool 




33.2 


iBreast ca. MUA-N 


9.9 


Spleen Pool 




7.6 


; Breast Pool 


29.5 


Thymus Pool 


11.0 


. I racnea 


6.3 


CNS cancer (glio/astro) U87-MG 




45.4 


Lung 


6.6 


CNS cancer (glio/astro) U- 1 1 8-MG 


138.7 


.reiai Lung 


24.8 


CNS cancer (neuro;met) SK-N-AS 


19.8 


!l imft/»» "MP1 KM 1 7 

jLiingca. iNL-i-fN4i / 


IT- 1 


CNS cancer (astro) SF-539 


! 


14.7 


jLiing ca. la- i 


3 1 .0 {CNS cancer (astro) SNB-75 


fioo.o 


!| nno rci XIPI H 1 Af\ 


8.5 |CNS cancer (glio)SNB- 19 


! 


21.5 


ii iirM? no cup 77 
iL-iing ca. oni - / / 


1 6.2 jCNS cancer (glio) SF-295 


[71.7 


jLiing ca. Aj'ty 


25.~0 [Brain (Amygdala) Pool 




: l linn no. MPl Ux7£ 

: Limg ca. In^i-MjZo 


5.5 j 


Brain (cerebellum) 


[23.3 


'1 nno MPl 140 1 
• Lling Ca. lNV_,l-rlZj 


34.2 jBrain (fetal) 


]26.4 


jLungca. NCI-H460 


37.6 


Brain (Hippocampus) Pool 


! 


8.0 


fcing ca. HOP-62 


13.5 


Cerebral Cortex Pool 


"1 


8.4 


jLung ca. NCI-H522 


20.2 


Brain (Substantia nigra) Pool 


(7.6 


iLiver 


0.2 


Brain (Thalamus) Pool 


l 
i 


14.0 


jFetal Liver 


31.6 


Brain (whole) 


[6.7 


jLiver ca. HepG2 


16.0 


Spinal Cord Pool 


9.9 


Kidney Pool 


32.8 


Adrenal Gland 


|5.3 


[Fetal Kidney 


11.7 


Pituitary gland Pool 


]2~9 
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Renal ca. 786-0 


11.0 


Salivary Gland 


I..6 


Renal ca. A498 


6.2 


Thyroid (female) 


" |7-7 


;Renal ca. ACHN 


7.6 


Pancreatic ca. CAPAN2 


1 1 2.2 


-Renal ca. UO-3 1 


16.8 


Pancreas Pool 


118.7 



Table PP. Panel 4.1D 



1 

1 

1 

^Tissue Name 

i 
i 
i 


Rcl. 

c.\p.( /o ) 

Ag4961, 
Run 

223691546 


— — — 

Tissue Name 


Rel. 

L\p.( /O) 

Ag4961, 
Run 

223691546 


jsecondary Th 1 act 


63.7 


HUVEClL-lbeta 


48.6 


jSecondary Th2 act 


46.7 


HUVEC I FN gamma 


45.7 


•Secondary Trl act 


54.3 


HUVEC TNF alpha + IFN gamma 


28.1 


:Secondary Th I rest 


7.7 


HUVEC TNF alpha + IL4 


30.6 


jSecondary Th2 rest 


16.8 


HUVEC IL-1 1 


15.0 


^Secondary Trl rest 


22.2 


Lung Microvascular EC none 


0.0 


'Primary Thl act 


27.5 


Lung Microvascular EC TNFalpha + IL- 
lbeta 


49.3 


{Primary Th2 act 


37.1 


Microvascular Dermal EC none 


r ' 


i 

jPrimaryTrl act 


39.2 


Microsvasular Dermal EC TNFalpha + 
IL-1 beta 


32.8 


^— — 

jPrimaryThl rest 


6.6 


Bronchial epithelium TNFalpha + 
ILlbeta 


r- 1 1 

30.8 


{Primary Th2 rest 


4.8 


Small airway epithelium none 


14.4 


Primary/ TV 1 re*ct 

riiiTidry iii rest 


Q O 


Small airway epithelium TNFalpha + IL- 
lbeta 


31.0 


(TD45RA CD4 lymphocyte act 


28.7 


Coronery artery SMC rest 


29.3 


|CD45RO CD4 lymphocyte act 


25.2 


Coronery artery SMC TNFalpha + IL- 
Ibeta 


34.6 


|CD8 lymphocyte act 


43.2 


Astrocytes rest 


29.5 


Secondary CD8 lymphocyte rest 


62.0 


Astrocytes TNFalpha + IL- 1 beta 


27.5 


Secondary CD8 lymphocyte act 


23.2 


KU-8 12 (Basophil) rest 


48 T" 


»CD4 lymphocyte none 


5.3 


KU-812 (Basophil) PMA/ionomycin 


100.0 


"2ry ThT/Th~2^ 
CH1 1 


16.0 


CCDI 106 (Keratinocytes) none 


31.4 


LAK cells rest 


32.1 


CCD1 106 (Keratinocytes) TNFalpha + 
IL-1 beta 


16.6 


LAK cells IL-2 


30.6 


Liver cirrhosis 


6.8 


LAK cells IL-2+IL-I2 


14.5 


NCI-H292 none 


33.0 


jLAK cells IL-2+IFN gamma 


22.2 


NC1-H292 IL-4 |47.3 


|LAK cells IL-2+IL- 18 


29.3 


NCI-H292 IL-9 


68.8 


1LAK cells PMA/ionomycin 


47.6 


NCI-H292IL-13 


44.1 


INK Cells IL-2 rest 


30.1 


NCI-H292 IFN gamma 


32.1 
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*Twn Wav Ml R ^ Hav 
, t \vu way ivjL,rv j Udy 


12 1 

J£.,J 


HPAEC none 


25.3 | 


iTvvn Wav MI R S Hav 

j i w ay iviljix j uay 


18.6 


HPAEC TNF alpha +IL-1 beta 


44.8 


iTwn Wflv Ml R 7 Hav 
■ i wvj way ivjL/ix / *Jaj 


17.9 


Lung fibroblast none 130.6 


|PBMC rest 


4.9 


Lung fibroblast TNF alpha + IL-1 beta 


18.7 | 


,PBMC PWM 


17.0 


Lung fibroblast IL-4 


24.1 ~i 


jPBMC PHA-L 


18.4 


Lung fibroblast IL-9 


29.9 ! 


jRamos (B cell) none 


44.8 


Lung fibroblast I L- 13 ]52.9 


jRamos (B cell) ionornycin 


58.6 


Lung fibroblast IFN gamma j57.0 


|B lymphocytes PWM 


27.9 


Dermal fibroblast CCD 1 070 rest j58.6 


B lymphocytes CD40L and IL-4 


26.4 


Dermal fibroblast CCD 1070 TNF alpha 


62.9 


EOL-1 dbcAMP 


36.3 


Dermal fibroblast CCD 1070 IL-1 beta 


31.0 


EOL-1 dbcAMP 
r ivi A\/ lunoiTiyL i n 


56.3 


Dermal fibroblast IFN gamma 


16.5 


Dendritic eelU none 


25.9 


Dermal fibroblast IL-4 


28.1; 


jDendritic cells LPS 


32.8 


Dermal Fibroblasts rest 


17.0 


'Dendritic cells anti-CD40 


25.9 ~ 


Neutrophils TNFa+LPS 


7~2 


Monocytes rest 


25.7 


Neutrophils rest 


1 0.2 


Monocytes LPS 


50.3 


Colon 


9.6 j 


Macrophages rest 


34.4 


Lung 


15.4 j 


Macrophages LPS 


13.2 


Thymus 


24.5 


HUVEC none 


33.2 " 


Kidney 


19.8 


1HUVEC starved 


25.9 


\ 



CNS_neurodegeneration_vl.O Summary: Ag4961 This panel confirms the 
expression of this gene at low levels in the brains of an independent group of individuals. 
However, no differential expression of this gene was detected between Alzheimer's diseased 
5 postmortem brains and those of non-demented controls in this experiment. Please see Panel 
1 .5 for a discussion of the potential role of this gene in treatment of central nervous system 
disorders. 

General_screening_pancl_vl.5 Summary: Ag4961 Highest expression of this gene 
is detected in brain cancer SNB-75 cell line (CT=28.6). Moderate levels of expression of this 

10 gene is also seen in cluster of cancer cell lines derived from pancreatic, gastric, colon, lung, 
liver, renal, breast, ovarian, prostate, squamous cell carcinoma, melanoma and brain cancers. 
Thus, expression of this gene could be used as a marker to detect the presence of these 
cancers. Furthermore, therapeutic modulation of the expression or function of this gene may 
be effective in the treatment of pancreatic, gastric, colon, lung, liver, renal, breast, ovarian, 

1 5 prostate, squamous cell carcinoma, melanoma and brain cancers. 
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Among tissues with metabolic or endocrine function, this gene is expressed at 
moderate to low levels in pancreas, adipose, adrenal gland, thyroid, pituitary gland, skeletal 
muscle, heart, fetal liver and the gastrointestinal tract. Therefore, therapeutic modulation of 
the activity of this gene may prove useful in the treatment of endocrine/metabolically related 
5 diseases, such as obesity and diabetes. 

Interestingly, this gene is expressed at much higher levels in fetal (CT=30.2) when 
compared to adult liver (CT=37). This observation suggests that expression of this gene can 
be used to distinguish fetal from adult liver. In addition, the relative overexpresston of this 
gene in fetal tissue suggests that the protein product may enhance liver growth or 
10 development in the fetus and thus may also act in a regenerative capacity in the adult. 
Therefore, therapeutic modulation of the protein encoded by this gene could be useful in 
treatment of liver related diseases. 

In addition, this gene is expressed at moderate levels in all regions of the central 
nervous system examined, including amygdala, hippocampus, substantia nigra, thalamus, 
1 5 cerebellum, cerebral cortex, and spinal cord. Therefore, therapeutic modulation of this gene 
product may be useful in the treatment of central nervous system disorders such as 
Alzheimer's disease, Parkinson's disease, epilepsy, multiple sclerosis, schizophrenia and 
depression. 

Panel 4.1D Summary: Ag4961 Highest expression of this gene is detected in 
20 PMA/ionomycin stimulated basophil (CT=30.4). This gene is expressed at high to moderate 
levels in a wide range of cell types of significance in the immune response in health and 
disease. These cells include members of the T-cell, B-cell, endothelial cell, 
macrophage/monocyte, and peripheral blood mononuclear cell family, as well as epithelial 
and fibroblast cell types from lung and skin, and normal tissues represented by colon, lung, 
25 thymus and kidney. This ubiquitous pattern of expression suggests that this gene product may 
be involved in homeostatic processes for these and other cell types and tissues. This pattern is 
in agreement with the expression profile in General_screening_panel_vl .5 and also suggests 
a role for the gene product in cell survival and proliferation. Therefore, modulation of the 
gene product with a functional therapeutic may lead to the alteration of functions associated 
30 with these cell types and lead to improvement of the symptoms of patients suffering from 
autoimmune and inflammatory diseases such as asthma, allergies, inflammatory bowel 
disease, lupus erythematosus, psoriasis, rheumatoid arthritis, and osteoarthritis. 
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Q. NOV72a: WW domain containing protein 

Expression of gene NOV72a was assessed using the primer-probe set Ag4977, 
described in Table QA. Results of the RTQ-PCR runs are shown in Tables QB and QC. 
Table QA. Probe Name Ag4977 



Primers 


Sequences 


Length 


Start Position lf T EQ ID 
jNo 


Forward 


5'-taggatgggaagaggcatatg-3' 


21 1263 }396 


Probe 


TET-5'-cttcatagaccacaacaccaaaacca-3'-TAMRA 


26 |303 1397 


Reverse 


5'-tactcgaggatcctcaatctga-3' 


22 |330 [398 



Table OB. General screening panel vl.5 



1 

t 


;Rcl. 






Rei. 


Tissue Name 


jExp.(%) 
|Ag4977, 
(Run 

1228940919 


Tissue Name 




!Exp.(%) 
|Ag4977, 
Run 

j228940919 


Adipose 


i0.9 


Renal ca. TK-10 




|40.6 


Melanoma* Hs688(A).T 


0.1 


Bladder 




[15.0 


jMelanoma* Hs688(B).T 


0.2 


Gastric ca. (liver met.) NC1-N87 




[23^5 * 


jMelanoma* Ml 4 


3.5 


Gastric ca. KATO III 




|59.5 


Melanoma* LOXIMVI ]6.3 


Colon ca. SW-948 




17.4 


Melanoma* SK-MEL-5 


P 


Colon ca. SW480 




Squamous cell carcinoma SCC-4 




Colon ca.* (SW480 met) SW620 




|5.8 


Testis Pool 


]0.5 


Colon ca. HT29 




Prostate ca.* (bone met) PC-3 


|2.3 


Colon ca. HCT-116 




121.3 


Prostate Pool 


4.1 


Colon ca. CaCo-2 


"\\5~A 


Placenta 


18.7 


Colon cancer tissue 




j4.6 


Uterus Pool 


0.2 


Colon ca. SW1116 




|1.5 


Ovarian ca. OVCAR-3 


12.8 


Colon ca. Colo-205 






jOvarian ca. SK-OV-3 


|3I.2 


Colon ca. SW-48 \2A 


jOvarian ca. OVCAR-4 


66.0 


Colon Pool 




[0-3 " 


[Ovarian ca. OVCAR-5 


100.0 


Small Intestine Pool 




6.4 


jOvarian ca. 1GROV-1 


18.3 


Stomach Pool 






lOvarian ca. OVCAR-8 


6.2 


Bone Marrow Pool 




o.3~ - 


jOvary 


1.9 


Fetal Heart 




0.0 


jBreast ca. MCF-7 


7.0 


Heart Pool 




0.0 


iBreast ca. MDA-MB-231 


18.0 


Lymph Node Pool 




0.6" " 


Breast ca. BT 549 \ 


2.0 


Fetal Skeletal Muscle 


6.6 ~ 


Breast ca. T47D 


6.3 1 


Skeletal Muscle Pool 


0.0 


Breast ca. MDA-N 


0.0 


Spleen Pool 


0.1 


Breast Pool 


1.6 


Thymus Pool jl.5 
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■Trachpa 




4.7 


jCNS cancer (glio/astro) U87-MG 




|0.7 




.1 1 1 n u 

( uung 




0.6 




jCNS cancer (glio/astro) U-1 1 8-MG 




lo.o 


i 


jFetal Lung 


;4.9 




CNS cancer (neuro;met) SK-N-AS 




13-1 


I 

- -J 


(Lungca. NCI-N4I7 




0.3 




CNS cancer (astro) SF-539 




0.3 


—J 


jLungca. LX-1 




6.7 


jCNS cancer (astro) SNB-75 


|0.9 


1 


|Lung ca. NCI-HI 46 


|2.4 




CNS cancer (glio) SNB-19 




18.6 




jLungca. SHP-77 


jO. 9 


jCNS cancer (glio) SF-295 


[2.2 




jLung ca. A549 


j!6.7 


(Brain (Amygdala) Pool 




3.8 




Lungca. NCI-H526 





2.0 


jBrain (cerebellum) 


|8.0 




Lungca. NCI-H23 




3.2 " 


iBrain (fetal) 




15.9 




Lungca. NCI-H460 


1.6 


Brain (Hippocampus) Pool 


5.0 




Lung ca. HOP-62 


1.8 


jCerebral Cortex Pool 




3.9 


Lungca. NCI-H522 


6.6 


jBrain (Substantia nigra) Pool 




4.2 


Liver 




0.5 


J 


Brain (Thalamus) Pool 




6.3 




•Fetal Liver 




2.5 


{Brain (whole) 


|4.8 " 




jLiver ca. HepG2 


]4.6 


jSpinai Cord Pool 




5.0 




■Kidney Pool 


10.1 


jAdrenal Gland 


[0.4 




Fetal Kidney 




5.2 




Pituitary gland Pool 


10.4 




Renal ca. 786-0 


j9.8 


|Salivary Gland 




21.0 




Renal ca. A498 


i 


7.9 


[Thyroid (female) 


[3.9 




Renal ca. ACHN 


1 


23.0 


t 

i 


Pancreatic ca. CAPAN2 




33.2 




Renal ca. UO-3 1 


;34.2 


i 
i 


Pancreas Pool 


j5.l 





Table PC. Panel 4. ID 



Tissue Name 

i 

1 


[Rcl. 

!E.\p.(%) 
|Ag4977, 
IRun 

223692679 


Tissue Name 


iRci. 

!Exp.(%) 
|Ag4977, 
Run 

[223692679 


jSecondary Th 1 act 


jb.i 


HUVECIL-lbeta 


0.3 " " " 


•Secondary Th2 act 


0.0 


HUVEC IFN gamma 


0.2 


jSecondary Trl act 


0.0 

_.„- 


HU VEC TNF alpha + IFN gamma 


0.0 


•Secondary Th 1 rest 




HUVEC TNF alpha + IL4 


0.5 " ~ 


;Secondary Th2 rest 


0.0 


HUVEC IL-ll 


o.r "~ 


^Secondary Trl rest 


0.0 


Lung Microvascular EC none 


0.0 


■Primary Th 1 act 


0.9 


Lung Microvascular EC TNFalpha + IL- 
lbeta 


1.3 


'Primary Th2 act 


0.1 


Microvascular Dermal EC none 


0.0 


Primary Trl act 


0.7 


Microsvasular Dermal EC TNFalpha + 
IL-lbeta 


0.3 


;Primary Thl rest 

i 


0.0 


Bronchial epithelium TNFalpha + 
1 LI beta 


35.1 


[Primary Th2 rest 


0.0 


Small airway epithelium none 


24.5 
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Primary Trl rest 

\ 


0.0 


o ii * - * - i i« „ ' i'"k. i r* ii lit 

Small airway epithelium TNFalpha + IL- 
lbeta 


55.9 


CD4dRA CD4 lymphocyte act 


12.6 


Coronery artery SMC rest 


6.7 


r 

CD45RO CD4 lymphocyte act 


1.1 


Coronery artery SMC TNFalpna + IL- 
Ibeta 


10.7 


t — 1 

CDS lymphocyte act 


0.0 


Astrocytes rest 


14.6 


jSecondary CD8 lymphocyte rest 


1.7 


Astrocytes TNFalpha + IL- 1 beta 


60.7 


jSecondary CD8 lymphocyte act 


0.0 


KU-812 (Basophil) rest 


3.2 


ICD4 lymphocyte none 


0.1 


KU-812 (Basophil) PMA/ionomycin 


5.7 


j2ry Thl/Th2/Trl_anti-CD95 

'r*u i i 
: tn 11 


0.0 


CCDI 106 (Keratinocytes) none 


28.9 


i 

jLAK cells rest 

1 


0.0 


CCD1 106 (Keratinocytes) TNFalpha + 
IL-lbeta 


57.4 


;LAK cells IL-2 


0.0 


Liver cirrhosis 


9.9 


[LA K ceils 1L-2+IL-12 


0.0 


NCI-H292 none (40.6 


;LAK cells IL-2+IFN gamma 


0.0 


NC1-H292 IL-4 |68.8 


;LAK cells IL-2+ IL-18 


0.3 


NCI-H292 IL-9 JlOO.O 


jLAK cells PMA/ionomycin 


0.0 


NCI-H292 IL-13 


70.7 


INK Cells IL-2 rest 


0.1 


NCI-H292 IFN gamma 


82.4 


iTwo Way MLR 3 day 


0.0 


HPAEC none 


0.0 


Jf wo Way MLR 5 day 


0.1 


HPAECTNF alpha + IL-1 beta jl.5 


iTwo Way MLR 7 day 


0.0 


Lung fibroblast none [4.2 


jPBMCrest 


0.1 


Lung fibroblast TNF alpha + IL- 1 beta 18.2 


•PRMP PU/M 


n ft 


Lung fibroblast IL-4 


4.4 


iPRN/lP PHA 1 
• rolVlLr rnA-L 


ft i 


Lung fibroblast IL-9 (8.9 


jixanios \o ceuj none 


.> 1 .H 


Lung fibroblast IL- 13 13.7 


jKanios \lj QQii) lonomycin 


ZJ.Z 


Lung fibroblast IFN gamma 


4.7 

FT 


;B lymphocytes PWM 


1.0 


Dermal fibroblast CCD1070 rest 




1 7 


Dermal fibroblast CCD1070 TNF alpha |23.0 


IPOI -1 HhrAMP 




Dermal fibroblast CCD 1070 IL- 1 beta j30.4 


fcoi -1 HhrAMP 
!PMA/ionomycin 


0.2 


Dermal fibroblast IFN gamma 


2.5 


iDendritic cells none 


0.0 


Dermal fibroblast IL-4 


2.8 


iDendritic cells LPS 


0.0 


Dermal Fibroblasts rest 


3.7 


jDendritic cells anti-CD40 


0.0 


Neutrophils TNFa+LPS 


0.0 


Monocytes rest 
Monocytes LPS 


0.0 
0.0 


Neutrophils rest 
Colon 


0.0 
99 


•Macrophages rest 


0.4 


Lung 


12.7 


Macrophages LPS 


0.1 


Thymus 


1.1 


iHUVEC none 


0.0 


Kidney 


79.6 


■HUVEC starved 


0.1 







Gcneral_screening_panel_vl.5 Summary: Ag4977 Highest expression of this gene 
is seen in an ovarian cancer cell line (CT=25.8). This gene is widely expressed in this panel, 
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with high to moderate expression seen in brain, colon, gastric, lung, breast, ovarian, and 
melanoma cancer cell lines. This expression profile suggests a role for this gene product in 
cell survival and proliferation. Modulation of this gene product may be useful in the 
treatment of cancer, specifically in ovarian, colon and renal cancers. 

5 In addition, this gene is expressed at much higher levels in fetal lung tissue (CT=30) 

when compared to expression in the adult counterpart (CT=37). Thus, expression of this gene 
may be used to differentiate between the fetal and adult source of this tissue. 

Among tissues with metabolic function, this gene is expressed at moderate to low 
levels in pituitary, adipose, adrenal gland, pancreas, thyroid, and adult and fetal liver. This 
1 0 widespread expression among these tissues suggests that this gene product may play a role in 
normal neuroendocrine and metabolic function and that disregulated expression of this gene 
may contribute to neuroendocrine disorders or metabolic diseases, such as obesity and 
diabetes. 

This gene is also expressed at moderate levels in the CNS, including the 
1 5 hippocampus, thalamus, substantia nigra, amygdala, cerebellum and cerebral cortex. 

Therefore, therapeutic modulation of the expression or function of this gene may be useful in 
the treatment of neurologic disorders, such as Alzheimer's disease, Parkinson's disease, 
schizophrenia, multiple sclerosis, stroke and epilepsy. 

Panel 4.1D Summary: Ag4977 Highest expression of this gene is seen in IL-9 
20 treated NCI-H292 cells (CT=28.5). Expression of this gene is also seen at moderate levels in 
activated and resting keratinocytes, small airway epithelium, bronchial epithelium, and 
astrocytes, a cluster of samples derived from NCI-H292 cells, and normal kidney. Low but 
significant levels of expression are seen in treated and untreated dermal and lung fibroblasts. 
The expression of this gene in cells derived from the lung and skin suggests that this gene 
25 may be involved in normal conditions as well as pathological and inflammatory lung and skin 
disorders that include chronic obstructive pulmonary disease, asthma, allergy, psoriasis and 
emphysema. 

R. NOV73a and NOV73b: GASDERMIN 

Expression of gene NOV73a and full length physical clone NOV73b was assessed 
30 using the primer-probe set Ag4981, described in Table RA. Results of the RTQ-PCR runs are 
shown in Tables RB and RC. Please note that NOV73b represents a full-length physical clone 
of the NOV73a gene, validating the prediction of the gene sequence. 
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Table RA. Probe Name Ag4981 



j 

!Primers 

i 


Sequences 


Length iStart Position ID 


| Forward 


S'-aggcactcccaaaagatgtc-S* 


20 |1007 


399 


jProbe 


TET-S'-ccatcctctatttcgttggagcccta-S'-TAMRA 


26 ;I052 


400 


{Reverse 


5'-agcttctgttgggcttcact-3' 


20 ]1087 |401 



Table RB. General screening panel vl.5 

5 ' ~ 



Tissue Name 


Rel. j 
Exp.(%) j 

Ag4981, 'Tissue Name 
Run j 
228940922 i 


iRel. 
Exp.(%) 

Run 

228940922 


Adipose 


|2.l 


! Renal ca. TK-10 


0.0 


IMelanoma* Hs688(A).T 


!0.4 " 


jBladder 


I..8 


jMelanoma* Hs688(B).T 


|l. 4 


'Gastric ca. (liver met.) NCI-N87 


: 3.1 


'Melanoma* M14 


0.0 


;Gastric ca. KATO III 


!0.i 


(Melanoma* LOXIMVl 


6.9 


(Colon ca. SW-948 


|0.9 


Melanoma* SK-MEL-5 


[6Td ' 


iColon ca. SW480 


jl0.6 


Squamous cell carcinoma SCC-4 


|0.7 


'Colon ca.* (SW480 met) SW620 


|l.9 


Testis Pool 


3.4 


IColon ca. HT29 


lo.o 


Prostate ca.* (bone met) PC-3 


0.0 


jColon ca. HCT-116 


io.o 


Prostate Pool jo.0 


•Colon ca. CaCo-2 


|0.I 


Placenta 


0.9 


jColon cancer tissue 


1100.0 


jUterus Pool 


0.7 


iColon ca. Swiil6 


!o.i ^ 


lOvarian ca. OVCAR-3 


o.o 


•Colon ca. CoFo-205 


[34.4 


jOvarian ca. SK-OV-3 j0.7 


IColon ca. SW-48 


31.6 


(Ovarian ca. OVCAR-4 


0.1 


IColon Pool 


20 


'Ovarian ca. OVCAR-5 


1.2 


jSmall Intestine Pool 


03 


JOvarian ca. 1GROV-I 


0.0 


jStomach Pool 


0.7 


Ovarian ca. OVCAR-8 


0.4 


•Bone Marrow Pool 


0.7 


jOvary 


0.7 


i Fetal Heart 


0.0 


jBreast ca. MCF-7 


0.0 


■Heart Pool 


0.7 


jBreast ca. MDA-MB-231 


0.0 


jLymph Node Pool 


1.5 


'Breast ca. BT 549 


0.0 


(Fetal Skeletal Muscle 


0.6 


Breast ca. T47D 


0.0 


•Skeletal Muscle Pool 


0.5 


Breast ca. MDA-N 


0.2 


iSpleen Pool 


0.0 


:Breast Pool 


0.6 


(Thymus Pool 


0.6 


jTrachea j 1 .8 


JCNS cancer (glio/astro) U87-MG 


~0.0 


iLung 


0.5 


jCNS cancer (glio/astro) U-l 18-MG 


0.0 


iFetal Lung 


6.3 


|CNS cancer (neurojmet) SK-N-AS 


0.0 


•Lung ca. NCI-N4I7 


0.0 


jCNS cancer (astro) SF-539 


0.0 
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;Lung ca. LX-1 


|82.4 




CNS cancer (astro) SNB-75 


jU.O 


} Lung ca. NCI-HI 46 


•0.8 




CNS cancer (glio) SNB-19 


In ft 


jLung ca. SHP-77 


CNS cancer (glio) SF-295 


ft ft 


jLung ca. A549 


\o.o 




Brain (Amygdala) Poo! 


!0 3 


.Lung ca. NLl-HDzo 


iO.O 


jBrain (cerebellum) 


jo.o 


{Lung ca. NCI-Hzj 






Brain (fetal) 


jo.i 


jLung ca. NLI-H460 




[Brain (Hippocampus) Pool 


0.2 


jLung ca. HOP-62 


jo.o 


{Cerebral Cortex Pool 


0.0 


jLung ca. NCI-H522 


io.o 

1 , u •> 


jBrain (Substantia nigra) Pool 


io.o 


[Liver 


]6.o 




Brain (Thalamus) Pool 


[6.2 ' 


jFetal Liver 


1.8 


Brain (whole) 


;0.4 


jLiver ca. HepG2 


0.0 


Spinal Cord Pool 


|oT" 


|Kidney Pool 


1.8 


Adrenal Gland II. 9 


jFetal Kidney 


12.3 


Pituitary gland Pool 


|0.4 


jRenal ca. 786-0 


10.9 


Salivary Gland 


T.o " " 


jRenal ca. A498 


0.2 


Thyroid (female) 


[67 


;Renal ca. ACHN 


;0.0 




Pancreatic ca. CAPAN2 


;0.9 


jRenal ca. UO-3 1 


j0.2 




Pancreas Pool j3.0 


Table RC. Panel 4.1D 


i 
I 

! 

jTissue Name 

! 
i 

i 
j 


Rcl. 

Exp.(%) 
Ag4981, 
Run 

223693389 


Tissue Name 


Rel. 

Exp.(%) 
Ag4981, 
Run 

223693389 


.Secondary Thl act 


3.6 


HUVEC IL-lbeta 


0.0 


Secondary Th2 act 


6.3 


HUVECI FN gamma 


0.0 


[Secondary Trl act 




HUVEC TNF alpha + IFN gamma 


o".'o ~ 


jSecondary Thl rest 


3.0 


HUVEC TNF alpha +IL4 


0.0 


'Secondary Tb2 rest 


16.6 


HUVEC IL-ll 


0.0 


{Secondary Trl rest 


0.0 


Lung Microvascular EC none 


__. .. 


j .. 

.Primary Thl act 


0.0 


Lung Microvascular EC TNFalpha + IL- 
Ibeta 


1.7 


r — ' — •■— 

; Primary Th2 act 


29.1 


Microvascular Dermal EC none 


0.0 


'Primary Trl act 


14.6 


Microsvasular Dermal EC TNFalpha + 
IL-lbeta 


0.0 


i 

jPrimary Thl rest 
- — 


9.3 


Bronchial epithelium TNFalpha + 
ILlbeta 


8.4 


! Primary Th2 rest 


9.9 


Small airway epithelium none 


100.0 


Primary Trl rest 


52.4 


Small airway epithelium TNFalpha + IL- 
Ibeta 


44.8 


CD45RA CD4 lymphocyte act 


12.9 


Coronery artery SMC rest 


0.0 


CD45RO CD4 lymphocyte act 


84. 7 


Coronery artery SMC TNFalpha + IL- 
1 beta 


6.8 



524 



WO 03/023002 ^^PCT/US02/28539 



CD8 lymphocyte act 


24.5 


Astrocytes rest 


\\A 1 


Secondary CD8 lymphocyte rest 


29.3 


Astrocytes TNFalpha + !L- 1 beta jO.O 


.Secondary CD8 lymphocyte act 


2.0 


KU-812 (Basophil) rest 


jO.O 


CD4 lymphocyte none 


12.9 


KU-81 2 (Basophil) PMA/ionomycin jO.O 


; 2ry Thl/Th2/Trl_anti-CD95 
.CHI 1 


14.7 

— <-• - 


jcCDl 106 (Keratinocytes) none 


k.2 

1 


j 

LAK cells rest 


5.2 


]CCD1 106 (Keratinocytes) TNFalpha + 
jlL-lbeta 


29.1 


; LAK cells IL-2 


16.6 


|Liver cirrhosis 


0.0 


;LAK cells IL-2+IL-12 


3.4 


]NCI-H292 none 


jo".o 


LAK cells IL-2+1FN gamma 


4.4 


jNCI-H292 IL-4 


jo.o 


iLAK cells IL-2+ IL-18 


0.0 


|NCI-H292 IL-9 


0.0 


LAK. cells PMA/ionomvcin 


1.1 


|NCI-H292 IL-13 


0.0 


! NK Celk II -2 rest 


7.3 


|NCI-H292 IFN gamma 


0.0 


Two Wav ML R 3 dav 


8.7 


]HPAEC none 


•bo 


Two Wav MLR 5 dav 


0.0 


jHPAECTNF alpha + IL-I beta 


0.0 


Two Wav MLR 7 dav 


2.2 JLung fibroblast none 


! 0.0 


PBMC rest 


14.2 


Lung fibroblast TNF alpha + 1L-1 beta 


,0.0 


jroMC rWM 


4.5 


Lung fibroblast IL-4 


0.0 


' nr> \ at^ nil a i 

;I BlvlC rHA-L 


14.0 


Lung fibroblast IL-9 


0.5 


jKamos (B cell) none 


2.6 


Lung fibroblast IL-13 


0.0 


jRamos (B cell) ionomycin 


4.0 


Lung fibroblast IFN gamma lO.O 


;b> lymphocytes rWM 


0.0 jDermal fibroblast CCblb70 rest 


4.3 


;h> lymphocytes CD4UL ana IL-4 


0.0 


Dermal fibroblast CCD 1070 TNF alpha " 


r 7.1 


bUL-l aocAIVlr 


0.0 


Dermal fibroblast CCD 1070 IL-1 beta 


6.3 


•C/^l 1 AUr> AN/ID 

,tUL- 1 aDCAIVlr 

;PMA/ionomycin 


0.0 


! 

Dermal fibroblast IFN gamma jO.O 


• Dendritic cells none 


10.7 


Dermal fibroblast IL-4 jO.O 


jDendritic cells LPS 


0.0 


Dermal Fibroblasts rest 


0.0 


[Dendritic cells anti-CD40 

i ■ ■ — ~ — — — - — — — 


2.2 


Neutrophils TNFa+LPS jo.O 


iMonocytes rest 


0.0 


Neutrophils rest 


0.0 


Monocytes LPS 


0.0 


Colon 


3.6 


■Macrophages rest 


29.7 


Lung 


II. 1 


Macrophages LPS 


0.0 


Thymus 


7.4 


HUVEC none 


0.0 


Kidney 


2.1 


HUVEC starved 


O.O 


i 
i 



General_screening_panel_vl.5 Summary: Ag498l Highest expression is seen in 
colon cancer tissue (CT=28.4), with prominent expression also detected in a lung cancer cell 
line. Thus, expression of this gene could be used to differentiate between these samples and 
5 other samples on this panel and as a marker to detect the presence of colon cancer. 
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Furthermore, therapeutic modulation of the expression or function of this gene may be 
effective in the treatment of colon cancer. 

Low but significant levels of expression are seen in pancreas, metabolic, and adrenal 
suggesting that modulation of the expression or function of this gene may be useful in the 
5 treatment of metabolic disease, including obesity and diabetes. 

Panel 4.1D Summary: Ag4981 Highest expression of this gene is seen in small 
airway epithelium (CT=3I.2). Low but significant levels of expression are detected in TNF-a 
and Il-l beta treated keratinocytes and small airway epithelium, primary T cells, dendritic 
cells, macrophages and normal lung. 

1 0 S. NOV75a: Synaptotagmin-Like Protein 3-A 

Expression of gene NOV75a was assessed using the primer-probe set Ag4993, 
described in Table SA. Results of the RTQ-PCR runs are shown in Tables SB and SC. 
Table SA. Probe Name Ag4993 



j 

iPrimers 

* 


Sequences 


Length [start Position 


SEQID 
No 


\ Forward 


5 , -aaagtgcaatccgtatgtgaag-3' 


22 jlOI9 


402 


; Probe 


TET-S'-acctacctgttgcccgacagatcct-S'-TAMRA 


25 j 1 04 1 


403 


\ Reverse 


5 '-gtgttcc tttggactccagtct-3 ' 


22 1081 


404 



Table SB. General screening panel vl.5 



;Tissue Name 

j 


Rel. 

Exp.(%) 
Ag4993, 
Run 

228941089 


i 

i 

i 

jTissuc Name 

i 

i 


! Rci. 

!ex P .(%) 

Ag4993, 
Run 

228941089 


Adipose 


1.5 


(Renal ca. TK-10 


2.6 


: Melanoma* Hs688(A).T 


1.2 


jBladder 


2.7 


Melanoma* Hs688(B).T 


2.6 


[Gastric ca. (liver met.) NCI-N87 


13.7 


.Melanoma* M14 


0.4 


jGastric ca. KATO 111 


0.2 


Melanoma* LOXIMVI 


7.7 


|Colon ca. SW-948 


0.1 


.Melanoma* SK-MEL-5 


0.4 


jColon ca. SW480 


16 " 


Squamous cell carcinoma SCC-4 


8.9 


{Colon ca.* (SW480 met) SW620 


0.9 


Testis Pool 




1.7 


jColon ca. HT29 


0.2 


Prostate ca.* (bone met) PC-3 




L0 J 


■Colon ca. HCT-116 


ioo!o 


^Prostate Pool 


2.7 


jColon ca. CaCo-2 


0.8 


'Placenta 


0.4 


Colon cancer tissue 


1.3 


lUterus Pool 


0.9 


jColon ca.SWI 116 


0.2 


^Ovarian ca. OVCAR-3 


1.5 


jColon ca. Colo-205 


0.4 
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jOvarian ca. SK-OV-3 


3.8 


Colon ca. SW-48 


T0.2 


;Ovarian ca. OVCAR-4 




10.0 

1 


jColon Pool 


" 


jOvarian ca. OVCAR-5 


5.8 


jSmall Intestine Pool 


jl.2 


lOvarian ca. 1GROV-1 


0.8 


iStomach Pool 


11.2 


jOvarian ca. OVCAR-8 


2.1 


Bone Marrow Pool ;0.8 


! Ovary 




0.8 


jFetal Heart 


[3.0 


Rreast ca MCF-7 




6.4" 

1 . , f . T ... ... 


[Heart Pool 

. l ir^ lf ,,,.^»„r <l) ,.i . , 


{0.6 


Rrea<;t ca MDA-MR-211 




0.8 

1 


|Lymph Node Pool 


h.5 


Breast ca BT S49 




1.6 


'Fetal Skeletal Muscle 


|2.1 


Breast ca T47D 




|K2 


{Skeletal Muscle Pool 




Breast ca MDA-N 




0.2 


Spleen Pool 


i 3 - 7 . 


Breast Pnnl 

LIILCljl 1 UUI 




1.2 


Thymus Pool J9.2 


Trachea 




4.4 


P\J^ rancor tn\\r\/nQtrr\\ 1 !R7-MPi 


1 1 R 7 






0.4 


|i-/i>Jo cancer ^guu/dMruj u-i io-ivhj 


in c 
I 1 " 


jFetal Lung 




5.8 


|CNS cancer (neuro;met) SK-N-AS 


jl.2 


jLungca. NCI-N4I7 






|CNS cancer (astro) SF-539 




jLung ca. LX-! 




4.6 


]CNS cancer (astro) SNB-75 


|1.6 


Lung ca. NCI-HI46 




4.9 


jCNS cancer (glio)SNB-l9 


[0.9 


Lungca. SHP-77 


{4.9 


JCNS cancer (glio)SF-295 


■) — — 

il.5 


Lung ca. A549 




0.5 


jBrain (Amygdala) Pool 


|0.7 


Lungca. NCI-H526 




0.1 


Brain (cerebellum) 


j0.8 


Lungca. NCI-H23 






jBrain (fetal) 


jl.4 


lungca. NC1-H460 




0.3 


jBrain (Hippocampus) Pool 


11.4 


Lung ca. HOP-62 




0.5 


Cerebral Cortex Pool 


[6.4 


Lungca. NCI-H522 


|1.0 


Brain (Substantia nigra) Pool 


[0.7 


jLiver 




0.1 


Brain (Thalamus) Pool 


jl.2 


IFetal Liver 


!o.i 


jBrain (whole) 


[6.8 


jLiver ca. HepG2 


0.2 


Spinal Cord Pool 


|l.4 


'Kidney Pool 


2.7 


Adrenal Gland 


jl.O 


jFetal Kidney 


1.1 


Pituitary gland Pool 


02 


iRenal ca. 786-0 


47.0 


Salivary Gland 


3.7 


jRenal ca. A498 


7.9 


Thyroid (female) 


9.9 


[Renal ca. ACHN 


1.9 


Pancreatic ca. CAPAN2 


2.2 


[Renal ca. UO-3 1 


2.0 


Pancreas Pool 


1.9 


Table SC. Panel 4. ID 




Rel. 




Rel. 


'Tissue Name 


Exp.(%) 
Ag4993, T 
Run 

223739949 


issue Name 


Exp.(%) 
Ag4993, 
Run 

223739949 


Secondary Th I act 


88.3 H 


UVEC I L- 1 beta 


0.6 


Secondary Th2 act 


100.0 H 


UVEC IFN gamma 


3.6 
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Secondary Trl act 


62.4 


HUVEC TNF alpha + IFN gamma 


!4.7 


Secondary Th 1 rest 


|ll.4 


HUVEC TNF alpha + IL4 


^2 " 


Secondary Th2 rest 


jl7.2 


HUVEC IL- 11 


j0.5 ■ " 


Secondary Trl rest 


10.7 


Lung Microvascular EC none 


13.7 


Primary Thl act 


13.0 


Lung Microvascular EC TNFalpha + IL- 
1 beta 


4.9 


iPrimary Th2 act 


55.9 


Microvascular Dermal EC none 


|2.0 


Primary Trl act 


27.0 


Microsvasular Dermal EC TNFalpha + 
IL-lbeta 


i 

i 

;5.0 

j 


\ nmary i n i rest 


1 1 ft 
1 l.o 


Bronchial epithelium TNFalpha + 
1 LI beta 


1 .6 


j Primary Th2 rest 


10.2 


Small airway epithelium none 


0.3 


'Primary Trl rest 


15.3 


Small airway epithelium TNFalpha + IL- 
Ibeta 


0.7 


CD45RA CD4 lymphocyte act 


29.9 


Coronery artery SMC rest 


4.2 


.clmdku L.U4 lymphocyte act 




Coronery artery SMC TNFalpha + IL- 
Ibeta 


! 


CDS lymphocyte act 


37.4 


Astrocytes rest 


0.2 


ISecondary CD8 lymphocyte rest 


71.7 " 


Astrocytes TNFalpha + IL- 1 beta 


10.6 


^Secondary CD8 lymphocyte act 


34.6 


KU-812 (Basophil) rest 


il.O 


CD4 lymphocyte none 


8.2 


KU-812 (Basophil) PMA/ionomycin ~J\3 


;2ryThl/Th2/Trl anti-CD95 
CHI1 


30.1 


CCD1 106 (Keratinocytes) none 


3.7 


;LAK cells rest 


11.7 


CCDI 106 (Keratinocytes) TNFalpha + 
IL-lbeta 


2, ' 


LAN cells IL-2 


26.4 


Liver cirrhosis 


4.2 


lais. cens IL-Z+IL-IZ 


j9.2 


NCI-H292 none 


13.8 


lan cens iL-z^irN gamma 




NCI-H292 IL-4 


13.7 


l/an cens il-z^ il-Io 


40. J 


NCI-H292 IL-9 


23.3 


,lais. cens r iviA/ lonomycin 


AO 1 


NCI-H292IL-13 113.4 


'Mk" fVlk II -9 r^cr 
in is v^ciib iLj-4, resi 




NCI-H292 IFN gamma 


21.0 


i wo way ivilk j aay 


if T 
15./ 


H PA EC none 


0.5 


'Tvvr» W/nv Ml P ^ rli\/ 

i wo way iviL,r\ j aay 


ZI.O 


HPAEC TNF alpha +IL-I beta 


4.0 


; j wo way iviLrv / aay 




Lung fibroblast none , 


1.9 


PBMC rest 


6.9 


Lung fibroblast TNF alpha + IL-1 beta 


0.9 


PBMC PWM 


77.4 


Lung fibroblast IL-4 |0.3 


PBMC PHA-L 


56.6 


Lung fibroblast IL-9 jb.6 


.Ramos (B cell) none 


1.3 


Lung fibroblast IL-I3 


0.4 


Ramos (B cell) ionomycin 


2.0 


Lung fibroblast IFN gamma 


1.7 


B lymphocytes PWM 


22.4 


Dermal fibroblast CCD 1070 rest 


1.2 


: B lymphocytes CD40L and IL-4 


5-2 


Dermal fibroblast CCD 1 070 TNF alpha 


18.0 


[eOL-I dbcAMP 


6.2 


Dermal fibroblast CCDI 070 IL- 1 beta 


0.9" 


'EOL-I dbcAMP 
;PMA/ionomycin 


17.7 


Dermal fibroblast IFN gamma 


2.7 


r— — 1 • 

;Dendritic cells none 


4.7 


Dermal fibroblast IL-4 


1.2 



528 



WO 03/023002 ^^>CT7US02/28539 



r .__ I—I — — ~ 

jDendritic cells LPS 


|4.l 


Dermal Fibroblasts rest 


1 1.6 


•Dendritic cells anti-CD40 

i 




Neutrophils TNFa+LPS U.8 


jMonocytes rest 


]\4J ~^ 


Neutrophils rest ;I3.5 


Monocytes LPS 


16.8 


Colon 


13.3 


Macrophages rest 


' 17.5 


Lung 


; 3.0 


Macrophages LPS 




Thymus 122.1 


HUVEC none 


16.5 


Kidney 


;1.4 


HUVEC starved 




i 



Gencral_screening_panel_vl.5 Summary: Ag4993 Highest expression of this gene 
is seen in a colon cancer cell line (CT=25.3). High levels of expression are also seen in a 
renal cancer cell line. Thus, expression of this gene could be used to differentiate between 
5 these samples and other samples on this panel and as a marker of these cancers. In addition, 
therapeutic modulation of the expression or function of this gene may be useful in the 
treatment of these cancers. 

In addition, this gene is expressed at much higher levels in fetal lung tissue (CT=29.4) 
when compared to expression in the adult counterpart (CT=33.3). Thus, expression of this 
10 gene may be used to differentiate between the fetal and adult source of this tissue. In addition, 
modulation of the expression or function of this gene may be useful in the treatment of 
diseases that affect this tissue. 

Among tissues with metabolic function, this gene is expressed at moderate levels in 
pituitary, adipose, adrenal gland, pancreas, thyroid, and adult and fetal skeletal muscle, heart. 
1 5 and liver. This widespread expression among these tissues suggests that this gene product 
may play a role in normal neuroendocrine and metabolic function and that disregulated 
expression of this gene may contribute to neuroendocrine disorders or metabolic diseases, 
such as obesity and diabetes. 

This gene is also expressed at moderate levels in the CNS, including the 
20 hippocampus, thalamus, substantia nigra, amygdala, cerebellum and cerebral cortex. 

Therefore, therapeutic modulation of the expression or function of this gene may be useful in 
the treatment of neurologic disorders, such as Alzheimer's disease, Parkinson's disease, 
schizophrenia, multiple sclerosis, stroke and epilepsy. 

Panel 4.1D Summary: Ag4993 Highest expression of this gene is seen in 
25 chronically activated Th2 cells (Ct=26.2). This gene is widely expressed on this panel, but is 
more prominently expressed in T cells, B cells and LAK cells. Therefore, the putative protein 
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encoded by this gene could potentially be used diagnostically to identify B or T cells. In 
addition, the gene product could also potentially be used therapeutically in the treatment of 
asthma, emphysema, IBD, lupus or arthritis and in other diseases in which T cells and B cells 
are involved. 

5 T. NOV76a: Hypothetical Intracellular 

Expression of gene NOV76a was assessed using the primer-probe set Ag5023, 
described in Table TA. Results of the RTQ-PCR runs are shown in Tables TB, TC and TD. 
Table TA. Probe Name Ag5023 



1 

i Primers 


Sequences 


■■■ i — 
Length {Start Position 


SEQID 
No 


Forward 


5'-cgtgtcaacggtatgtttctct-3' 


22 : 1 1 75 


405 


Probe 


TET-5 , -taattcttgcacatccaaggatggca-3'-TAMRA 


26 jl221 


406 


Reverse j5'-agagaaagcaatggttccattt-3' 


22 il249 


407"" 



Table TB. CNS neurodegeneration vl.O 



Tissue Name 


Rel. 

Exp.(%) 
Ag5023, 
Run 

224757456 


Tissue Name 


jRcl. 

Exp.(%) ' 

Ag5023, 

Run 

224757456 


iAD 1 Hippo 


19.6 


Control (Path) 3 Temporal Ctx 


7.1 


AD 2 Hippo 


25.9 


Control (Path) 4 Temporal Ctx 


18.3 


AD 3 Hippo 


9.7 


AD 1 Occipital Ctx 


20.2 


.AD 4 Hippo 


7. 1 iAD 2 Occipital Ctx (Missing) {0.0 


;AD 5 hippo 


73.7 AD 3 Occipital Ctx j 12.9 


|AD 6 Hippo 


71.7 |AD 4 Occipital Ctx 


18.8 n 


[Control 2 Hippo 


21.6 ]AD 5 Occipital Ctx 


20.0 


jControl 4 Hippo 


9.6 |AD 6 Occipital Ctx 


31.2 


Control (Path) 3 Hippo 


4.7 jControl 1 Occipital Ctx 


4.2 


AD 1 Temporal Ctx 


26.8 jControl 2 Occipital Ctx 


40.1 


AD 2 Temporal Ctx 


37.9 


Control 3 Occipital Ctx 


12.4 


AD 3 Temporal Ctx 


6.8 


Control 4 Occipital Ctx 


5.9 


AD 4 Temporal Ctx 


15.1 


Control (Path) 1 Occipital Ctx [69.7 


AD 5 Inf Temporal Ctx 


100.0 


Control (Path) 2 Occipital Ctx ;9j6~ 


AD 5 SupTemporal Ctx 


39.0 


Control (Path) 3 Occipital Ctx |4.7 


'AD 6 Inf Temporal Ctx 


56.3 


Control (Path) 4 Occipital Ctx 23.0 


iAD 6 Sup Temporal Ctx 


61.1 


Control I Parietal Ctx 


9.1 


Control 1 Temporal Ctx 


14.6 


Control 2 Parietal Ctx 


52.9 


■Control 2 Temporal Ctx 


25.9 


Control 3 Parietal Ctx 


17.8 


jControl 3 Temporal Ctx 


18.2 


Control (Path) 1 Parietal Ctx [55.1 
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[Control 4 Temporal Ctx 


|4.2 


iControl (Path) 2 Parietal Ctx 


jl 7.4 j 


[Control (Path) 1 Temporal Ctx 


i35.6 


jControl (Path) 3 Parietal Ctx 




5.7 -j 


•Control (Path) 2 Temporal Ctx 


(29.3 


[Control (Path) 4 Parietal Ctx 




^87 j 


Table TC General screening panel vl.5 






Tissue Name 


Rel. 

|Exp.(%) 
|Ag5023, 
iRun 

1228959376 


Tissue Name 


iRcl. 

!Exp.(%) 
|Ag5023, 
{Run 

1228959376 


Adipose |2.3 


Renal ca. TK-10 


5.7 


Melanoma* Hs688(A).T 


j2.3 


Bladder 




3.6 


Melanoma* Hs688(B).T |3.0 


Gastric ca. (liver met.) NCI-N87 




7.9 


Melanoma* MI4 


|2.l 


Gastric ca. KATO III 


j8.4 


Melanoma* LOXIMVI 


12.7 


Colon ca. SW-948 




2.1 


Melanoma* SK-MEL-5 


•3.3 


Colon ca. SW480 


jl.9 


Squamous cell carcinoma SCC-4 




Colon ca.* (SW480 met) SW620 




1.3 


Testis Pool 


•2.0 


Colon ca. HT29 




1.3 


Prostate ca.* (bone met) PC-3 


i4.2 


^olon ca. HCT-116 


!5.3 


Prostate Pool i2.5 


Colon ca. CaCo-2 




3.3 


Placenta 


j0.7 


Colon cancer tissue 




1.5 


Uterus Pool [2.4 


Colon ca. SW1116 


j0.5 


Ovarian ca. OVCAR-3 


4.5 


Colon ca. Colo-205 




n.i 


Ovarian ca. SK-OV-3 jl 1.1 jColon ca. SW-48 






Ovarian ca. OVCAR-4 |l.3 


Colon Pool 




Ovarian ca. OVCAR-5 


5.2 


Small Intestine Pool 






jOvarian ca. IGROV-I 


3.2 jStomach Pool 




3.0 ~] 


'Ovarian ca. OVCAR-8 


2.4 iBone Marrow Pool 


jl.5 


Ovary 


3.0 


Fetal Heart 




3.1 


Breast ca. MCF-7 j4.5 


Heart Pool 




2-5 


Breast ca. MDA-MB-23 1 j9.9 


Lymph Node Pool 


j4.9 


[Breast ca. BT 549 


5.1 


Fetal Skeletal Muscle 




4.5 


[Breast ca. T47D 


1.4 


Skeletal Muscle Pool 




10.7 ] 


iBreast ca. MDA-N 


1.8 


Spleen Pool 




±} . .. _j 


! Breast Pool 


4.1 


Thymus Pool 


2.5 ; 


;Trachea 


1 .4 jCNS cancer (glio/astro) U87-MG 




16.6 


Lung 


0.0 ;CNS cancer (glio/astro) U-l 1 8-MG 


jl 1.2 


[Fetal Lung 


8.0 


CNS cancer (neuro;met) SK-N-AS 


""P'\ 


jLung ca. NCI-N417 


1 . 1 {CNS cancer (astro) SF-539 


1 




Lungca. LX-I 


4.2 jCNS cancer (astro) SNB-75 


IM J 


Lungca. NCI-HI 46 jo.6 


CNS cancer (glio) SNB-19 


|3.6 - 


Lungca. SHP-77 


2.1 


CNS cancer (glio) SF-295 


1 


12.1 


Lung ca. A549 


1.6 


Brain (Amygdala) Pool 


1 


1.3 
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iLiingca.NCI-H526 


7.0 " 


Brain (cerebellum) 


4.4 


: Lung ca.NCI-H23 


2.1 


Brain (fetal) 


2.1 


Ijung ca. NCI-H460 


0.6 


Brain (Hippocampus) Pool 


0.7 


jLung ca. HOP-62 


1.3 


Cerebral Cortex Pool 


0.6 


;Lungca.NCI-H522 


I.I 


Brain (Substantia nigra) Pool 


0.6 




Liver 


e - - 


Brain (Thalamus) Pool f0.9 


Fetal Liver i 100.0 


Brain (whole) 


0.2 


Liver ca. HepG2 


2.2 


Spinal Cord Pool 


2.2 


jKJdney Pool 




Adrenal Gland J 1.8 


Fetal Kidney 


4.6 jPituitary gland Pool 


1.0 


Renal ca. 786-0 |6.2 


Salivary Gland 


0.9 


Renal ca. A498 


0.7 


Thyroid (female) 


1.4 


Renal ca. ACHN 


2.4 


Pancreatic ca.CAPAN2 


4.6 


Renal ca. UO-31 


4.5 


Pancreas Pool 


7.3 



Table TP, Panel 4.1D 



jTissue Name 

i 

i 
I 


Rel. 

Exp.(%) 
Ag5023, 
Run 

223740941 


Tissue Name 


Rcl. ° 
Exp.(%) 
Ag5023, 
Run 

223740941 


^Secondary Thl act |s0.7 


HUVEC IL-lbeta |20.4 


[Secondary Th2 act 


Too.o 


HUVEC I FN gamma ;26.4 


jSecondary Trl act 


67.4 


HUVEC TNF alpha + I FN gamma |20.0 


{Secondary Thl rest 


18.7 


HUVEC TNF alpha + 1L4 125.7 


Secondary Th2 rest 


43.8 


HUVEC IL-11 ;I6.6 


jSecondary Trl rest 


10.7 


Lung Microvascular EC none |46.7 


■Primary Thl act 


42.0 
46.3 


Lung Microvascular EC TNFalpha + 1L- 
Ibeta 


37.6 


'Primary Th2 act 


Microvascular Dermal EC none 


33.0 


! 

.Primary Trl act 

i 


60.7 


Microsvasular Dermal EC TNFalpha + 
IL-lbeta 


21.9 


; PrimaryThl rest 


22.7 


Bronchial epithelium TNFalpha + 
1 L 1 beta 


35.1 


'Primary Th2 rest 


6.5 


Small airway epithelium none 


11.0 


• 

: Primary Trl rest 


33.2 


Small airway epithelium TNFalpha + IL- 
1 beta 


20.2 


! CD45RA CD4 lymphocyte act 


47.0 


Coronery artery SMC rest 


6.1 


CD45RO CD4 lymphocyte act 


52.1 


Coronery artery SMC TNFalpha + IL- 
Ibeta 


24.5 


,CD8 lymphocyte act 


48.3 


Astrocytes rest 


12.5 


i — - - — — ~~ 

;Secondary CD8 lymphocyte rest 


42.9 


Astrocytes TNFalpha + IL- 1 beta 


8.5 


^Secondary CDS lymphocyte act 


24.5 


KU-8 12 (Basophil) rest 


32.1 


•CD4 lymphocyte none 


10.5 


KU-812 (Basophil) PMA/ionomycin 


43.2 



532 



WO 03/023002 



'CT/US02/28539 



;2ry Thl/Th2/TrI_anti-CD95 

:PU1 1 
V_ rl ! I 


27.9 


CCD1 106 (Keratinocytes) none 


39.2 


LAK cells rest 


23.0 


CCD) 106 (Keratinocvtes) TNFalpha + 
IL-1 beta 


23.8 


LAK cells IL-2 


60.3 


Liver cirrhosis 


4.9 


LAK cells IL-2+IL-I2 


25.9 


NCI-H292 none 


j22.8 


LAK cells IL-2+IFN eamma 


28 5 


NCI-H292 IL-4 n 


27.2 


!LAK cells IL-2+ IL-1 8 


32.3 


NC1-H292 IL-9 


41.2 


;LAK cells PMA/ionomycin 


19.8 


NC1-H292 IL-13 


40.9 


INK Cells IL-2 rest 


44.8 


NCI-H292 I FN gamma 




45.4 


.Two Way MLR 3 day 


27.7 


HPAEC none 


i 

i 


17.0 


|Two Way MLR 5 day 


28.7 


HPAEC TNF alpha + 1L-I beta 




40.9 


ITvvo Wav MLR 7 dav 


28.7 


Lung fibroblast none 




23.2 


jPBMC rest 


15.2 


Lung fibroblast TNF alpha + IL-1 beta 




20.3 


rBMC PWM 


55.5 


Lung fibroblast IL-4 


134.2 


;rt3lVIL, rnA-L 


J 1 .4 


Lung fibroblast IL-9 


|40.1 


;Ramos (B cell) none 


C\ 1 A 

91.4 


Lung fibroblast IL-13 


125.3 


; Ramos (B cell) ionomycin 


46.7 


Lung fibroblast I FN gamma 


|39.8 


jB lymphocytes PWM 


34.4 


Dermal fibroblast CCD 1070 rest 


|49.0 i 


|B lymphocytes CD40L and IL-4 


47.3 


Dermal fibroblast CCDI070 TNF alpha 763.7 


jEOL-l dbcAMP 


^0.7 


Dermal fibroblast CCD 1070 IL-1 beta 


|38.4 


'EOL-I dbcAMP 
jPMA/ionoiTiycin 


51.8 


Dermal fibroblast IFN gamma 


! 

1 

1 


18.8 


'Dendritic cells none 


32.5 


Dermal fibroblast IL-4 


744.4 


pendritic cells LPS 


11.2 


Dermal Fibroblasts rest 


24.8 


iDendritic cells anti-CD40 


29.1 


Neutrophils TNFa+LPS 


jo.o 


■Monocytes rest jll.l 


Neutrophils rest 




Monocytes LPS 


12.2 


Colon 




^Macrophages rest 


30.8 


Lung 




jMacrophages LPS 


8.1 


Thymus 


|20.4 


jHUVEC none 


19.9 


Kidney 


j24.7 


iHUVEC starved 


31.0 





CNS_neurodegcncration_vl.O Summary: Ag5023 This gene appears to be slightly 
upregulatcd in the temporal cortex of Alzheimer's disease patients. Therefore, therapeutic 
modulation of the expression or function of this gene may decrease neuronal death and be of 
5 use in the treatment of this disease. 

General_scrcening_panel_vl.5 Summary : Ag5023 This gene is widely expressed 
in this panel, with highest expression in fetal liver (CT=26.8). In addition, this gene is 
expressed at much higher levels in fetal lung and lung tissue (CT=30) when compared to 
expression in the adult counterparts (CTs=40). Thus, expression of this gene may be used to 
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differentiate between the fetal and adult sources of these tissues. In addition, therapeutic 
modulation of the expression or function of this gene may be useful in the treatment of 
diseases that affect these organs. 

Moderate to low expression is seen in brain, colon, gastric, lung, breast, ovarian, and 
5 melanoma cancer cell lines. This expression profile suggests a role for this gene product in 
t cell survival and proliferation. Modulation of this gene product may be useful in the 
treatment of cancer. 

Among tissues with metabolic function, this gene is expressed at moderate levels in 
pituitary, adipose, adrenal gland, pancreas, thyroid, and adult and fetal skeletal muscle, heart, 
10 and liver. This widespread expression among these tissues suggests that this gene product 
may play a role in normal neuroendocrine and metabolic function and that disregulated 
expression of this gene may contribute to neuroendocrine disorders or metabolic diseases, 
such as obesity and diabetes. 

This gene is also expressed at moderate levels in the CNS, including the 
15 hippocampus, thalamus, substantia nigra, amygdala, cerebellum and cerebral cortex. 

Therefore, therapeutic modulation of the expression or function of this gene may be useful in 
the treatment of neurologic disorders, such as Alzheimer's disease, Parkinson's disease, 
schizophrenia, multiple sclerosis, stroke and epilepsy. 

Panel 4.1 D Summary: Ag5023 Highest expression of this gene is seen in 
20 chronically activated Th2 cells (CT=30). This gene is also expressed at moderate levels in a 
wide range of cell types of significance in the immune response in health and disease. These 
cells include members of the T-cell, B-cell, endothelial cell, macrophage/monocyte, and 
peripheral blood mononuclear cell family, as well as epithelial and fibroblast cell types from 
lung and skin, and normal tissues represented by colon, lung, thymus and kidney. This 
25 ubiquitous pattern of expression suggests that this gene product may be involved in 

homeostatic processes for these and other cell types and tissues. This pattern is in agreement 
with the expression profile in General_screening_panel_vl.5 and also suggests a role for the 
gene product in cell survival and proliferation. Therefore, modulation of the gene product 
with a functional therapeutic may lead to the alteration of functions associated with these cell 
30 types and lead to improvement of the symptoms of patients suffering from autoimmune and 
inflammatory diseases such as asthma, allergies, inflammatory bowel disease, lupus 
erythematosus, psoriasis, rheumatoid arthritis, and osteoarthritis. 
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LL NOV78a: Selenoprotein X 1 

Expression of gene NOV78a was assessed using the primer-probe set Ag5042, 

described in Table UA. 

Tabic UA. Probe Name Ag5042 



Primers 


Sequences 


Length jStart Position 


SEQ ID 
No 


:Forward 


5'-tgagccacaagttcctgaac-3' 


20 |263 


408 


Probe 


TET-S'-agtcccgattcatattcagcagctcg^'-TAMRA 


26 1302 


409 


: Reverse 


5-tgcctttagggacaaacttca<3' 


21 =329 


410 



V. NOV79a: Hypothetical WD-repeat 

Expression of gene NOV79a was assessed using the primer-probe set Ag5050, 

10 described in Table VA. Results of the RTQ-PCR runs are shown in Tables VB and VC 
Table VA. Probe Name Ag5050 



[ 

Primers 

i 


Sequences 


i i 

Length Start Position 

i 


SEQ ID 
No 


'Forward 
iProbe 


5'-gtgaagccaaactatgctctca-3' 


22 : 1 2 1 


411 


TET-5 , -caaagcagtgtcctccgtgaaattca-3 , -TAMRA 


26 ~ ]l65 _ 


412 


'Reverse 


5'-atgaacttgccagccactct-3' 


20 ;20l 


413 



Table VB. CNS neurodegeneration vl.O 



;TissueNamc 

i 


Rel. 

Exp.(%) 
Ag5050, 
Run 

224080134 


Tissue Name 


Rel. 

Exp.(%) 
Ag5050, 
Run 

224080134 


AD 1 Hippo 


25.3 


Control (Path) 3 Temporal Ctx ]5.2 


AD 2 Hippo 


27.2 JControl (Path) 4 Temporal Ctx 


34.9 


AD 3 Hippo 


8.9 ~|AD 1 Occipital Ctx 


14.5 


AD 4 Hippo 


4.6 


AD 2 Occipital Ctx (Missing) 


0.0 


•AD 5 Hippo 


98.6 


AD 3 Occipital Ctx 


9.4 


AD 6 Hippo 


67.4 


AD 4 Occipital Ctx 


9.l""~ 


^Control 2 Hippo 


23.5 


AD 5 Occipital Ctx 


473 


Control 4 Hippo 


13.2 


AD 6 Occipital Ctx 


29.5 


•;Control (Path) 3 Hippo 


8.0 


Control 1 Occipital Ctx 


4.5 


IAD 1 Temporal Ctx 


20.3 


Control 2 Occipital Ctx 


50.7 


JAD 2 Temporal Ctx 


30.6 


Control 3 Occipital Ctx 


16.5 


|AD 3 Temporal Ctx 


6.7 


Control 4 Occipital Ctx 


6.9 
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•AD 4 Temporal Ctx 




Control (Path} 1 Occinital Ptv 170 ? 


ADS Inf Temooral Ctx 


100 0 


Control (Pat hi 1 Orrinital Ctv i | n 1 


;AD 5 Sup Temporal Ctx 


51.4 


Control (Path) 3 Occipital Ctx |5.4 


;AD6 Inf Temporal Ctx 


74.7 


Control (Path) 4 Occipital Ctx j 1 7.2 


jAD 6 Sup Temporal Ctx 


85.9 


Control 1 Parietal Ctx |6.9 


'Control 1 Temporal Ctx 


7.5 


Control 2 Parietal Ctx [42.9 


'Control 2 Temporal Ctx 


47.6 


Control 3 Parietal Ctx |21.9 


;Control 3 Temporal Ctx 


17.9 


Control (Path) 1 Parietal Ctx ;68.8 


IControI 3 Temporal Ctx 


12.2 


Control (Path) 2 Parietal Ctx |22.2 


Control (Path) 1 Temporal Ctx 


55.9 


Control (Path) 3 Parietal Ctx |6.4 j 


jControl (Path) 2 Temporal Ctx 


34.2 


Control (Path) 4 Parietal Ctx ^36.9 



Table VC. Panel 4.1D 



1 


Rel. 




:Rci. 


1 
1 

jTissue Name 

i 

t 
i 


Exp.(%) 
Ag5050, 

Dun 

Klin 

223796098 


Tissue Name 


;Exp.(%) 
|Ag5050, 
Run 

|223796098 


i — 

jSecondary Thl act 


59.0 


HUVEC IL-lbeta 


|44.4 


'Secondary Th2 act 


61 7 


HUVEC IFN gamma 


133.7 


.Secondary Trl act 


49.0 


HUVEC TNF alpha + IFN gamma 


126.4 


^Secondary Thl rest 


7.4 


HUVEC TNF alpha +-1L4 


|52.1 


jSecondary Th2 rest 


12.4 


HUVEC IL-11 


|19.1 


Secondary Trl rest 


9.5 


Lung Microvascular EC none 


j49.3 


, — , „ — . 

{Primary Thl act 


45.1 


Lung Microvascular EC TNFalpha + IL- 
1 beta 


36.3 


Primary Th2 act 


63.7 


Microvascular Dermal EC none 


[23.5 


'Primary Trl act 

i 


54.7 


Microsvasular Dermal EC TNFalpha + 
IL-lbeta 


25.3 


f 

jPnmaryThl rest 


11.0 


Bronchial epithelium TNFalpha + 
I LI beta 


28.7 


Primary Th2 rest 


1.9 


Small airway epithelium none 


15.5 


■Primary Trl rest 


15.8 


Small airway epithelium TNFalpha + IL- 
Ibeta 


23.0 


: CD45RA CD4 lymphocyte act 


49.3 


Coronery artery SMC rest 


18.4 


CD45RO CD4 lymphocyte act 


44.8 


Coronery artery SMC TNFalpha + IL- 
Ibeta 


13.0 


'CD8 lymphocyte act 


51.4 


Astrocytes rest 


7.2 


Secondary CD8 lymphocyte rest 


44.1 


Astrocytes TNFalpha + IL- 1 beta i7.0 


Secondary CD8 lymphocyte act 


15.0 


KU-812 (Basophil) rest 


44.8 


CD4 lymphocyte none 


6.3 


KU-8 12 (Basophil) PMA/ionomycin 


48.3 


2ryThl/Th2/Trl anti-CD95 
ICH1I 


13.6 


CCD1 106 (Keratinocytes) none 


100.0 
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{ 

=LAK cells rest 

! 


10.9 


iCCDI 106 (Keratinocytes) TNFalpha + 
II -lbeta 


45.1 


'l Al/ nolle II 0 

.LAN. CcllS IL«-Z 


if\ 4 

ZU.t 




4 S 
H.J 


!| Ak' opIIq TI -0+U -1? 

,LAl\ LLIlb 1 L< Z. ' 1 L, 1 Z 


16 4 


NCI-H292 none 


24 S 


:I AK" rpllc II -?+IF>J oamma 
jLAlN. Cclib T IriN gdlilma 


17 7 


NCI-H292 II -4 


48 fi 




2ft 1 




11 7 


{LfAN ceus riviA/ionornycin 


24 8 


NCI-H2Q2 II -11 

i ^ V— 1 I 1 L_< 1 J 


40 1 


jiNN. v_ens il-z rest 


14 Q 




HQ.yJ 


1 1 wo way iviLK j aay 


00 7 
zz. / 


nrrtcv^ nunc » 


1 ^ 1 


j i wo way iviLK j aay 


i 1 n 




OD.J ! 


i wo way iviLK / aay 


1 R 7 


Lung iiDruDidsi none 




PBMC rest 


5.8 


Lung fibroblast TNF alpha + IL-1 beta 


14.1 j 


PBMC PWM 


35.4 


Lung fibroblast IL-4 


44.4 


PBMC PHA-L 


29.1 


Lung fibroblast IL-9 


42.6 


[Ramos (B cell) none 


46.0 


Lung fibroblast IL-1 3 


25.9 


jRamos (B cell) ionomycin 


53.6 


Lung fibroblast IFN gamma 


45.4 


f B lymphocytes PWM 


35.1 


Dermal fibroblast CCD 1 070 rest 


54.7 


IB lymphocytes CD40L and IL-4 


19.5 


Dermal fibroblast CCD1070 TNF alpha 


50.0 


jEOL-1 dbcAMP 


66.4 


Dermal fibroblast CCD1070 IL-1 beta 


28.7 


EOL-1 dbcAMP 
jPMA/ionomycin 


44.4 


Dermal fibroblast IFN gamma 


14.9 


! r%i*nH rit \c ppIIc nrtnp 
jL/ci iu 1 1 Lens nunc 


1 1 Q 


Dermal fibroblast IL-4 


35.6 


[Dendritic ceils LPS 


10.2 


Dermal Fibroblasts rest 


17.0 


[Dendritic cells anti-CD40 


21.2 


Neutrophils TNFa+LPS 


0.2 


{Monocytes rest 


11.5 


Neutrophils rest 


1.3 


jMonocytes LPS 

j. , , -v.. - ..... 


15.0 


Colon 


5.7 


Macrophages rest 


14.8 


Lung 


7.2 


{Macrophages LPS 


5.3 


Thymus Jl5.1 


jHUVEC none 


32.5 


Kidney 


12.8 


IHUVEC starved 

t . ., . ...,„_. .... 


39.5 


1 



CNS_neurodegeneration_vl.O Summary: Ag5050 This gene appears to be slightly 
upregulated in the temporal cortex of Alzheimer's disease patients. Therefore, therapeutic 
modulation of the expression or function of this gene may decrease neuronal death and be of 
5 " use in the treatment of this disease. 

Panel 4.1D Summary: Ag5050 Highest expression of this gene is seen in 
keratinocytes (CT=29). This gene is also expressed at moderate levels in a wide range of cell 
types of significance in the immune response in health and disease. These cells include 
members of the T-cell, B-cell, endothelial cell, macrophage/monocyte, and peripheral blood 
10 mononuclear cell family, as well as epithelial and fibroblast cell types from lung and skin, 
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and normal tissues represented by colon, lung, thymus and kidney. This ubiquitous pattern of 
expression suggests that this gene product may be involved in homeostatic processes for 
these and other cell types and tissues also suggests a role for the gene product in cell survival ; 
and proliferation. Therefore, modulation of the gene product with a functional therapeutic 
5 may lead to the alteration of functions associated with these cell types and lead to 

improvement of the symptoms of patients suffering from autoimmune and inflammatory 
diseases such as asthma, allergies, inflammatory bowel disease, lupus erythematosus, 
psoriasis, rheumatoid arthritis, and osteoarthritis. 

W. NOV84a and NOV84b: GTF2IRD1 

1 0 Expression of gene NOV84a and full-length physical clone NOV84b was assessed 

using the primer-probe set Ag3588, described in Table WA. Results of the RTQ-PCR runs 
are shown in Tables WB and WC. Please note that NOV84b represents a full-length physical 
clone of the NOV84a gene, validating the prediction of the gene sequence. 

Table WA. Probe Name Ag3588 V> 



Primers 


Sequences 


Length 


Start Position 


ISEQ ID 
INo 


Forward 


5'-tgggagagagcgtatttcttc-3 ' 


21 


1670 


414 


Probe 


TET-5'-tggaagtacagaatattccaacatgtctca-3'-TAMRA 


3(P 


1639 


415 


i Reverse 


5 , -acacagacatgctttgtttgc-3' 


21 


1615 


1416 



Table WB. CNS neurodegeneration vl.O 



1 

Tissue Name 

1 

| 


Rel. 

Exp.(%) 
Ag3588, 
Run 

211006685 


i 

l 

Tissue Name 

■ 


Rel. 

Exp.(%) 
Ag3588, 
Run 

211006685 


jAD 1 Hippo 


24.8 


Control (Path) 3 Temporal Ctx 


10.0 


IAD 2 Hippo 


YsY 


Control (Path) 4 Temporal Ctx 


25.0 


AD 3 Hippo 


12.5 


AD 1 Occipital Ctx 


21.8 


jAD 4 Hippo 


13.5 


AD 2 Occipital Ctx (Missing) 


fro 


■AD 5 hippo 


55.1 


AD 3 Occipital Ctx 


12.8 


AD 6 Hippo 


50.3 


AD 4 Occipital Ctx 


26.2 


^Control 2 Hippo 


29.9 


AD 5 Occipital Ctx 


12.8 


Control 4 Hippo 


22.2 


AD 6 Occipital Ctx 


33.7 ~ ' 


jControl (Path) 3 Hippo ]9.7 


Control 1 Occipital Ctx 


5.0 


jAD 1 Temporal Ctx 


35.4 


Control 2 Occipital Ctx 


45.4 


jAD 2 Temporal Ctx 


32.3 


Control 3 Occipital Ctx 


24.5 


jAD 3 Temporal Ctx ]8.7 jControl 4 Occipital Ctx 


15.1 I 
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AD 4 Temporal Ctx 
iAD 5 Inf Temporal Ctx 



]AD 5 SupTemporal Ctx 



iAD 6 Inf Temporal Ctx 



;AD 6 Sup Temporal Ctx 
[Controf 1 Temporal Ctx 
fcontrol 2 Temporal Ctx 

r~ — 



|28.7 
jlOCLO 



[Control (Path) I Occipital Ctx 
{Control (Path) 2 Occipital Ctx 



177.4 



50.3 



Control (Path) 3 Occipital Ctx 



]58.2 jControl (Path) 4 Occipital Ctx 



44.1 



Control 1 Parietal Ctx 



& 6 
j28.7 



Control 
Control 



2 Parietal Ctx 

3 Parietal Ctx 



13.6 



115.3 



12.9 



|49.3 
114.9 



jControl 3 Temporal Ctx 



;20.9 



Control (Path) 1 Parietal Ctx 



jConJxoU Temporal Ctx^ j 16.4 

(Control (Path) 1 Temporal Ctx [46.7 



Control (Path) ^Parietal Ctx 
Control (Path) 3 Parietal Ctx 



152.5 



[25.2 



jC ontrol (P ath) 2 Te mporal Ctx | 28 .3 jControl (Path) 4 Parietal Ctx 43.8 



Table WC. general oncology screening panel v 2.4 



1 

i 


Rel. 




;Rel. 


1 


Exp.(%) 




jExp.(%) 


i 

jTissue Name 


Ag3588, 


Tissue Name 


|Ag3588, 




Run 




iRun 


I 
i 
j 


267747154 




1267747154 


f— — - - — 

{Colon cancer 1 


8.2 


Bladder NAT 2 


ji.6 


jColon NAT 1 


5.8 


Bladder NAT 3 


jColon cancer 2 


4.4 


Bladder NAT 4 


! 10.4 


jColon NAT 2 


8.2 


Prostate adenocarcinoma 1 


|32.8 


[Colon cancer 3 


18.3 


Prostate adenocarcinoma 2 


jColon NAT 3 


17.3 


Prostate adenocarcinoma 3 


115.1 


jColon malignant cancer 4 


8.5 


Prostate adenocarcinoma 4 


1 1 3.0 


.Colon NAT 4 


2.6 


Prostate NAT 5 


jLung cancer I 


13.0 


Prostate adenocarcinoma 6 


17.3 


jLung NAT I 


2.4 


Prostate adenocarcinoma 7 


Lung cancer 2 


37.4 


Prostate adenocarcinoma 8 


!3.3 


Lung NAT 2 


2.2 


Prostate adenocarcinoma 9 {36.9 


Squamous cell carcinoma 3 


12.5 


Prostate NAT 10 


13.9 


Lung NAT 3 


1.5 


Kidney cancer 1 


|25.5 


;Metastatic melanoma 1 


56.3 


Kidney NAT 1 


15.8 " " 


{Melanoma 2 


2.5 


Kidney cancer 2 |35.8 


^Melanoma 3 


2.8 


Kidney NAT 2 


25.9 


^Metastatic melanoma 4 


88.3 


Kidney cancer 3 


36.9 


iMetastatic melanoma 5 


100.0 


Kidney NAT 3 


10.6 


;B ladder cancer 1 


2.5 


Kidney cancer 4 


13.9" 


r B ladder NAT 1 


0.0 


Kidney NAT 4 


6."l 


Bladder cancer 2 


4.2 







CNS_neuroclegeneration_vl.0 Summary: Ag3588 This gene appears to be slightly 
upregulated in the temporal cortex of Alzheimer's disease patients. Therefore, therapeutic 
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modulation of the expression or function of this gene may decrease neuronal death and be of 
use in the treatment of this disease. 

general oncology screening panel_v_2.4 Summary: Ag3588 This gene is widely 
expressed in this panel, with highest expression in melanoma (CT=28.6). In addition, this 
5 gene is more highly expressed in lung cancer than in the corresponding normal adjacent 

tissue. Thus, expression of this gene could be used as a marker of these cancers. Furthemore, 
therapeutic modulation of the expression or function of this gene product may be useful in the 
treatment of lung and melanoma cancer. 

X. NOV85a and NOV85b: Intracellular Protein 

10 Expression of gene NOV85a and full-length physical clone NOV85b was assessed 

using the primer-probe sets Ag3597 and Ag3679, described in Tables XA and XB. Results of 
the RTQ-PCR runs are shown in Tables XC, XD, XE and XF. Please note that NOV85b 
represents a full-length physical clone of the NOV85a gene, validating the prediction of the 
gene sequence. 
15 Table XA. Probe Name Ag3597 



(Primers 

i 


Sequences 


Length 


Start Position l^ EQ 10 
|No 


1 Forward 


5'-aaggaacacagcctacttgtca-3' 


22 |I87 |4I7 


jProbe 


TET-5'-cttcaaccacctaacagccacagcag-3'-TAMRA 


26 


158_ j4l8 
132* '"" [419 


(Reverse 


5'-aaagcccactaggagagagaca-3' 


22 



Tabic XB. Probe Name Ag3679 



Primers 


Sequences 


i 

Length IStart Position 


jSEQ ID 
jNo 


Forward 


S'-acaaaggaacacagcctacttg-S* 


22 1190 


{420 


Probe 


TET-S^cttcaaccacctaacagccacagcag-B'-TAMRA j26 j 1 58 


"|42~1 


Reverse 


5'-gcccactaggagagagacactt-3' 


22 |135 


"|422 



Table XC CNS neurodegeneration vl.O 



Tissue Name 


Rel. 

Exp.(%) 
Ag3597, 
Run 

211010103 


|Rel. 
Exp.(%) 

Tissue Name |Ag3597, 

iRun 

211010103 


JAD 1 Hippo 


18.2 


Control (Path) 3 Temporal Ctx j 1 1 .0 


jAD2 Hippo 


24.0 


Control (Path) 4 Temporal Ctx |28.7 


[AD 3 Hippo 


13.8 


AD 1 Occipital Ctx j2I.O 
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:AD 4 Hippo 


7.1 




'AD 2 Occipital Ctx (Missing) 


0.0 


AD 5 Iiiddo 

* » * — ' *J 111 yj yj \J 


72. 7""" ' 




AD 3 Occipital Ctx 


77~3 _ " 


sAD 6 Hinno 


47.6 


;AD 4 Occipital Ctx 


18.0 


jControl 2 Hinno 


19.5 


|AD 5 Occipital Ctx 


32.3 


iControl 4 Hippo 


9.8 


iAD 6 Occipital Ctx 


30.8 


IControl fPatM 3 Hinno 


11.3 


iControl 1 Occipital Ctx 


79 


!AD 1 Temnnral Ctx 


26.6 


jControl 2 Occipital Ctx 


33.0 


;AD 2 Temooral Ctx 


32.3 


jControl 3 Occipital Ctx 


18.7 


iAD 3 Temooral Ctx 


7.0 


iControl 4 Occipital Ctx 


8.8 


iAD 4 Teinnoral Ctx 

j ' ~ X vlllpUIUI 1 1 » 


29.1 


iControl (Path) 1 Occipital Ctx 


55.9 


*AD 5 Inf Temooral Ctx 


100.0 


jControl (Path) 2 Occipital Ctx 


12.5 


AD 5 SupTemporal Ctx 


49.7 


Control (Path) 3 Occipital Ctx 


1 1.3 


AD 6 Inf Temporal Ctx 


47.0 


Control (Path) 4 Occipital Ctx 


14.6 


AD 6 Sup Temporal Ctx 


42.9 




Control 1 Parietal Ctx 


i3.r 


'Control 1 Temporal Ctx 


10.0 


IControl 2 Parietal Ctx 




Control 2 Temporal Ctx 


25.2 


(Control 3 Parietal Ctx 


15.9 


IControl 3 Temporal Ctx 


77.1 


jControl (Path) 1 Parietal Ctx 


43.2 


IControl 4 Temporal Ctx 


12.7 


jControl (Path) 2 Parietal Ctx 


22.4 


jControl (Path) 1 Temporal Ctx 


37.9 


IControl (Path) 3 Parietal Ctx 


10.7 


jControl (Path) 2 Temporal Ctx 


27.9 




Control (Path) 4 Parietal Ctx 


28.3 



Table XD. General screening panel vl.4 



j iRel. 
| |Exp.(%) 
jTissuc Name ;Ag3597, 
:Run 

|218307127 


Rel. 

Exp.(%) | 

Ag3679, jTissue Name 
Run ! 
218941309 | 


Rel. iRel. 
Exp.(%) ]Exp.(%) 
Ag3597, |Ag3679, 
Run Run 
218307127 {218941309 


iAdipose 


17.7 


4.6 


Renal ca. TK-10 


26.8 


25.0 


jMelanoma* 
|Hs688(A).T 


22.2 


22.5 


Bladder 


23.0 


27.7 


jMelanoma* | 9? . 
;Hs688(B).T f 


23.5 


Gastric ca. (liver met.) 
NCI-N87 


36.9 


37.1 


jMelanoma* M14 J19.9 


21.3 'Gastric ca. KATO III 


45.7 


51.8 


iMelanoma* L. - 
LOXIMVI \ 


21.5 


Colon ca. SW-948 


7.2 


10.7 


.Melanoma* SK- 
MEL-5 


27.4 


38.2 


Colon ca. SW480 


26.8 


46.0 


Squamous cell 
carcinoma SCC-4 


22.8 


32.3 


Colon ca* (SW480 met) 
SW620 


21.5 


19.3 


Testis Pool 


31.0 


26.1 jColon ca. HT29 


11.4 


10.5 


{Prostate ca.* (bone 
jmet) PC-3 


42.3 


43.5 


Colon ca. HCT-1 16 


32.1 


34.9 


'Prostate Pool 


12.4 


13.1 


Colon ca. CaCo-2 


27.7 


33.9 


iPlacenta 


20.3 


21.0 


Colon cancer tissue 


15.6 


12.8 
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[Uterus Pool jl3.l 


12.8 iColonca. SW11 16 


11.0 |12.5 


'Ovarian ca. OVCAR- ! - 
: |33.2 

■ i 


26.8 


Colon ca. Colo-205 


2.9 jS.1 


:Ovarian ca. SK-OV-3 136.6 


25.5 IColon ca. SW-48 


3.8 |6.5 


jOvarian ca. OVCAR- 
|4 


16.4 


i 

17.8 jColonPool 


19.2 


24.3 


lOvarian ca. OVCAR- |„ ~ 

i i 


58.2 jSmall Intestine Pool 


31.4 


33.7 


jOvarian ca. IGROV-1 jl5.7 


18.6 


Stomach Pool 


11.9 


14.8 


jOvarian ca. OVCAR- 

18 


5.8 


9.7 jBone Marrow Pool 


11.2 


10.6 


lOvary 


13.4 


12.1 jFetal Heart 


22.7 


,24.3 


IBreast ca. MCF-7 


47.0 


57.8 [Heart Pool 


10.5 


112.8 


iBreast ca. MDA-MB- 
231 


40.9 


48.3 jLymph Node Pool 

i 


27.0 


24.7 


: Breast ca. BT 549 


52.1 


50.0 jFetal Skeletal Muscle j 1 7.3 


18.3 


iBreast ca. T47D 


100.0 


100.0 {Skeletal Muscle Pool 


30.8 !28.5 


iBreast ca. MDA-N |I3.6 


21.8 iSpleenPool 


17.1 

19.8 " 


19.9 


iBreast Pool 122.2 


20.7 (Thymus Pool 


119.3 


i i 

:Trachea |21.5 

I 


9 . 9 |CNS cancer (glio/astro) 
IU87-MG 


31.9 145.4 


jLung !5.6 


5.3 


jCNS cancer (glio/astro) 
jlI-118-MG 


46.3 ]563 


Tetal Lung j36.9 

I ~ t . .. 


35.6 


iCNS cancer (neuro;met) 
jSK-N-AS 


29.3 127.4 

... a 


<Lungca.NCI-N4l7 U.O 

\ i 


jCNS cancer (astro) SF- 
/,J |539 


10.4 |12.5 

_ i. 


i 

Xungca. LX-1 (36.1 


id 6 ^ CNS cancer (astro) SNB- 


40.1 


51.4 
19.9 


jLung ca.NCI-H 146 |5.3 


6.3 


CNS cancer (glio) SNB- 
19 


14.2 


jLung ca. SHP-77 


13.5 


24.5 


CNS cancer (glio) SF-295 


49.7 |44.4 


jLung ca. A549 |22.5 


27.7 


Brain (Amygdala) Pool 


22.5 |20.7 


;Lungca.NCI-H526 |8.4 


12.5 


Brain (cerebellum) 


77.9 


79.0 


!Lungca. NCI-H23 


24.0 


35.1 


Brain (fetal) 


36.9 


37.1 


lung ca. NCI-H460 


9.9 


15.5 


Brain (Hippocampus) 
Pool 


18.8 


19.9 


jLung ca. HOP-62 j9.6 j 1 2.8 


Cerebral Cortex Pool 


19.9 
18.9 


21.3 


! 

iLungca. NCI-H522 


24.0 j23.8 


Brain (Substantia nigra) 
Pool 


19.6 


•Liver 


5.1 J 


5.2 


Brain (Thalamus) Pool ]30. 1 


31.0 


jFetal Liver 


14.0 


24.1 


Brain (whole) 


23.8 


25.5 


[Liver ca. HepG2 j 17.0 


18.8 


Spinal Cord Pool 


25.0 ]27.7 


■Kidney Pool j30.8 


43.2 


Adrenal Gland 


36.1 j34.6 


jFetal Kidney 


24.7 


28.5 


Pituitary gland Pool 


5.8 |8.2 


jRenal ca. 786-0 |21.8 


21.6 


Salivary Gland 


15.6 1 1 5.7 
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Renal ca. A498 


|4.2 


j4.6 


iThyroid (female) 


13.4 


! 13.2 


|Renal ca. ACHN 


1 1 7.6 


j 1 8.8 


|Pancreatic ca. CAPAN2 


27.7 


|27.5 


Serial ca. UO-31 

: ... . ■■ 


|24.7 


H22.I " 
.1 


jPancreas Pool 


29.7 


•27.9 



Tabic XE. Panel 4. ID 



Tissue Name 


iRel. IRcl. 
Exp.(%) <Exp.(%) 
|Ag3597, |Ag3679, 
iRun jRun 
•169910426 1169988037 


Tissue Name 


Rel. 

Exp.(%) 
Ag3597, 
Run 

169910426 


Rel. 

Exp.(%) 
Ag3679, ] 
Run 

169988037 ! 


Secondary Thl act 


|63.7 


64.6 


HUVEC IL-1 beta 


25.9 j 1 8.8 


Secondary Th2 act 


j64.2 


95.3 


HUVEC IFN gamma 


34.4 


m. — 


Secondary Trl act 


82.4 


87.7 


HUVEC TNF alpha + 
IFN gamma 




! 


; t 

Secondary Thl rest |26.8 


|4I.8 


[HUVEC TNF alpha + 
|IL4 


27.2 


30.4 j 

- - -1 


Secondary Th2 rest 


|42.3 


|60.7 [HUVEC IL-1 1 


13.1 


21.6 j 


Secondary Trl rest 


136,6 

I 


I 


Lung Microvascular EC 
none 


44.4 


52.1 

I 


Primary Thl act 


43.5 


i 

|54.0 

! 


Lung Microvascular EC 
TNFalpha+ IL-1 beta 


48.3 


. 

48.6 ; 

1 i 

j — J 


Primary Th2 act 


155.5 


63.3 


Microvascular Dermal 
EC none 


24.3 


35.! 1 


Primary Trl act 


51.. 


73.7 


Microsvasular Dermal 

ECTNFalpha+lL- 

lbeta 


25.9 


24.8 

! 


.Primary Thl rest 


48.6 


56.3 


Bronchial epithelium 
TNFalpha + ILIbeta 


35.4 


31.9 i 


Primary Th2 rest |46.7 

\ 


57.4 


Small airway 
epithelium none 


17.2 


18.7 ! 


Primary Trl rest 

i 


49.7 


69.3 


Small airway 
epithelium TNFalpha + 
IL-lbeta 


38.2 


46.3 


jCD45RACD4 
.lymphocyte act 


51.4 


63.3 


Coronery artery SMC 
rest 


24.0 

• 

„ . . 


36.6 


,CD45RO CD4 
^lymphocyte act 


66.9 


95.3 


Coronery artery SMC 
TNFalpha + IL-lbeta 


33.0 J32.5 


CD8 lymphocyte act [58.6 


75.8 


Astrocytes rest 


19.8 (26.6 


^Secondary CD8 
lymphocyte rest 


51.1 

1 


69.3 


Astrocytes TNFalpha + 
1 L- 1 beta 


17.2 J26.6 


Secondary CD8 
lymphocyte act 


38.7 


37.9 


KU-812 (Basophil) rest 


37.1 |50.7 


pD4 lymphocyte none 


42.0 


58.6 


KU-8 12 (Basophil) 
PMA/ionomycin 


72.7 


68.3 


|2ryTh!/Th2/Trl anti- 
cs CH 11 


41.5 


56.6 


CCD 1106 

(Keratinocytes) none 


65.1 


64.2 
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LAK cells rest 


61.1 


71.7 


CCD 1106 
(Keratinocytes) 
TNFaIpha+ IL-Ibeta 


48.3 


58.6 


•LAK cells IL-2 161.1 


72.7 


Liver cirrhosis 


14.9 


__ 


LAK cells IL-2+IL-12 jlOO.O 


62.0 


NCI-H292 none 


22.1 


30.6 


LAK cells IL-2+IFN L 3 
gamma i 


65.1 


NCI-H292 IL-4 


36.9 


42.0 


LAK cells IL-2+IL-18 |73.7 


100.0 


NCI-H292 IL-9 


62.9 j70.2 


iLAK cells j 586 
jPMA/ionomycin j 


83.5 


NCI-H292 IL-13 

! 


42.0 |37.4 


NK Cells IL-2 rest 159.9 


98.6 |NCI-H292 IFN gamma 


46.0 


48.3 


;Tvvo Way MLR 3 day 


72.7 


65.5 


HPAEC none 


27.4 


26.2 


iTwo Way MLR 5 day 

i 


43.8 


56.3 


HPAECTNF alpha + 
IL-1 beta 


37.1 


48.3 


Two Way MLR 7 day 


29.3 


40.1 


Lung fibroblast none 


27.0 129.5 


PBMCrest 144.1 


58.6 


Lung fibroblast TNF 
alpha + IL-1 beta 


17.2 |24.7 


PBMC PWM M8.3 


60.7 


Lung fibroblast IL-4 


25.3 131.6 


:PBMCPHA-L J31.9 


52.5 jLung fibroblast IL-9 


45.4 |43.2 ■> 


; Ramos (B cell) none |65.5 


87.1 


Lung fibroblast IL-13 


30.1 25.0 


Ramos (B cell) 2 
monomycin j 


87.1 


Lung fibroblast IFN 
gamma 


31.4 ,32.1 

i 


jB lymphocytes PWM 


33.2 


52.9 


Dermal fibroblast 
CCD 1070 rest 


45.4 |51.1 


=B lymphocytes CD40L 
and IL-4 


58.2 


78.5 


Dermal fibroblast 
CCD 1070 TNF alpha 


74.2 


98.6 
34.9 


•EOL-I dbcAMP 


40.1 


60.3 


Dermal fibroblast 
CCD 1070 IL-1 beta 


32.5 


.EOL-\ dbcAMP j 507 
PMA/ionomycin j 


75.8 


Dermal fibroblast IFN 
gamma 


20.3 127.9 

1 


.Dendritic cells none 


41.5 


52.9 


Dermal fibroblast IL-4 


41.2 


41.2 


Dendritic cells LPS 


28.1 


42.0 


Dermal Fibroblasts rest 


24.8 


29.7 


Dendritic cells anti- 
CD40 


36.9 


40.9 


Neutrophils TNFa+LPS 


15.6 


29.5 


Monocytes rest 


55.1 


60.3 


Neutrophils rest 


84.1 


76.8 


Monocytes LPS 


57.4 


82.4 


Colon 


34.9 j34.4 


Macrophages rest 


40.1 


54.0 


Lung 


31.0 |29.3 


.Macrophages LPS 


22.5 


31.4 


Thymus 


90.1 |85.3 | 


HUVEC none 


15.0 


24.0 


Kidney 


49.7 1 


52.5 


HUVEC starved 


28.1 |29.7 







Table XF. general oncology screening panel v 2.4 
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; 11 •■ — 1 — 

i 

iTissuc Name 

t 

I 


Rel. 

Exp.(%) 
Ag3597, 
Run 

267747376 


Rel. 

Exp.(%) 
Ag3679, 
Run 

267742157 


•Tissue Name 

! 

i 
t 


Rcl. 

Exp.(%) 
Ag3597, 
Run 

267747376 


Rcl. 

E.\p.(%) 
Ag3679, 
Run 

267742157 


jColon cancer 1 


20.4 


28.1 |BladderNAT2 


1.6 


2.1 


jColon NAT 1 


24.5 


13.2 [Bladder NAT 3 


1.7 


2.2 


;Colon cancer 2 


32.5 


28.7 [Bladder NAT 4 


17.4 


13.7 


Colon NAT 2 

i 


20.0 


22.8 


Prostate adenocarcinoma 
1 


50.3 


69.7 


jColon cancer 3 


45.4 


40.6 


Prostate adenocarcinoma 
2 


8.1 


8.0 


jColon NAT 3 


42.9 


34.4 


Prostate adenocarcinoma 
3 


27.2 


20.2 


jColon malignant 
icancer 4 


75.3 


73.7 


Prostate adenocarcinoma 
4 


32.5 


33.2 


Colon NAT 4 


11.6 


11.1 j Prostate NAT 5 


7.0 17.3 


|Lung cancer 1 


19.8 


18.6 


Prostate adenocarcinoma 
6 


I 

11.9 |9.6 
i 


jLungNAT 1 


3.8 


, 7 jProstate adenocarcinoma 

b 


9.3 


10.5 


•Lung cancer 2 


JV.J 


66 9 jProstate adenocarcinoma 

. i 8 


4.7 


4.5 


'Lung NAT 2 


3.7 


A , [Prostate adenocarcinoma 
4-6 j 0 , 


38.7 

! 


28.9 


Squamous cell 
jcarcinoma 3 


55.5 


47.3 jProstate NAT 10 


4.8 


4.9 


jLungNAT 3 


2.0 


1.8 jKidney cancer 1 


31.6 137.1 


{Metastatic melanoma 

h 


36.3 


38.4 jKidney NAT 1 


14.7 1 1 7.6 

i 


{Melanoma 2 

» 


9.9 


1 0.7 jKidney cancer 2 


100.0 |82.9 


{Melanoma 3 


12.2 


11.0 jKidney NAT 2 


41.8 |31.9 


(Metastatic melanoma 
|4 


89.5 


92.7 

i 


Kidney cancer 3 


1 

45.4 j34.9 


Metastatic melanoma 
.5 


95.3 


100.0 


Kidney NAT 3 


11.4 jl2.l 


Bladder cancer 1 


4.7 


3.1 


Kidney cancer 4 


41.2 {30. 1 


bladder NAT 1 


0.0 


0.0 


Kidney NAT 4 


21.8 j 


12.8 


Bladder cancer 2 


12.0 


10.5 







General_screening_panel_vl.4 Summary: Ag3597/Ag3679 Two experiments with 
the same probe and primer produce results that are in excellent agreement. Highest 
expression of this gene is seen in a breast cancer cell line. Higher levels of expression are also 
5 seen in breast, prostate, ovarian and lung tissues when compared to expression in normal 
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tissue. Thus, expression of this gene could be used as a marker of these cancers and 
therapeutic modulation of the activity of this gene may be effective in their treatment. 

Among tissues with metabolic function, this gene is expressed at high to moderate 
levels in pituitary, adipose, adrenal gland, pancreas, thyroid, and adult and fetal skeletal 
5 muscle, heart, and liver. This widespread expression among these tissues suggests that this 
gene product may play a role in normal neuroendocrine and metabolic and that disregulated 
expression of this gene may contribute to neuroendocrine disorders or metabolic diseases, 
such as obesity and diabetes. 

- This gene is also expressed at high to moderate levels in the CNS, including the 
1 0 hippocampus, thalamus, substantia nigra, amygdala, cerebellum and cerebral cortex. 

Therefore, therapeutic modulation of the expression or function of this gene may be useful in 
the treatment of neurologic disorders, such as Alzheimer's disease, Parkinson's disease, 
schizophrenia, multiple sclerosis, stroke and epilepsy. 

This gene codes for variant of DMR protein and a homologue of mouse dystrophia 
1 5 myotonica-containing WD repeat motif protein (DMR-N9 protein). DMR-N9 has been 
implicated in myotonic dystrophy (MD). Therefore, therapeutic modulation of this gene 
could be useful in the treatment of MD 

Panel 4.1D Summary: Ag3597/Ag3679 Two experiments with the same probe and 
primer produce results that are in excellent agreement. Highest expression of this gene is seen 

20 in cytokine activated LAK cells. In addition, this gene is expressed at high to moderate levels 
in a wide range of cell types of significance in the immune response in health and disease. 
These cells include members of the T-cell, B-cell, endothelial cell, macrophage/monocyte, 
and peripheral blood mononuclear cell family, as well as epithelial and fibroblast cell types 
from lung and skin, and normal tissues represented by colon, lung, thymus and kidney. This 

25 ubiquitous pattern of expression suggests that this gene product may be involved in 

homeostatic processes for these and other cell types and tissues. This pattern is in agreement 
with the expression profile in General_screening_pane!_vl .4 and also suggests a role for the 
gene product in cell survival and proliferation. Therefore, modulation of the gene product 
with a functional therapeutic may lead to the alteration of functions associated with these cell 

30 types and lead to improvement of the symptoms of patients suffering from autoimmune and 
inflammatory diseases such as asthma, allergies, inflammatory bowel disease, lupus 
erythematosus, psoriasis, rheumatoid arthritis, and osteoarthritis. 
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general oncology screening panel_v_2.4 Summary: Ag3597/Ag3679 Two 
experiments produce results that are in very good agreement, with highest expression in a 
kidney cancer and a melanoma (CT=27.3-28.7). In addition, this gene is more highly 
expressed in lung and kidney cancer than in the corresponding normal adjacent tissue. In 
5 addition, consistent prominent expression is seen in melanoma and prostate cancer. Thus, 
expression of this gene could be used as a marker of these cancers. Furthemore, therapeutic 
modulation of the expression or function of this gene product may be useful in the treatment 
of lung, melanoma, prostate and kidney cancer. 

1 0 Y. NOV87a and NOV87b: Glycolipid Transfer Protein-like 

Expression of gene NOV87a and full-length physical clone NOV87b was assessed 
using the primer-probe set Ag6896, described in Table YA. Results of the RTQ-PCR runs are 
shown in Table YB. Please note that NOV87b represents a full-length physical clone of the 
NOV87a gene, validating the prediction of the gene sequence. 
15 Table YA. Probe Name Ag6896 



JPrimers 


Sequences 


Length [start Position 


ISEQ ID No 


[Forward 


5 '-cgtcaccgtggcc ttct-3 ' 


17 j74l 


{ 423 


iProbe 


TET-5'-cacgctgcccacacgcgagg-3'-TAMRA 


20 |759 


|424 


[Reverse 


5 -gttcatggcc tec aggaaga-3 ' 


20 j779 


1425 j 



Table YB. General screening panel vl.6 



1 

I 

jTissue Name 

] 


jRel. 

!Exp.(%) 
|Ag6896, 
IRun 

i278388389 


JTissue Name 


jRel. 
Exp.(%) 
|Ag6896, 
■Run 

|278388389 


Adipose 


\6~j&" 


Renal ca. TK-10 


19.6 


^Melanoma* Hs688(A).T 


133.9 


Bladder 


j!7.0 


•Melanoma* Hs688(B).T 


{29.1 


Gastric ca. (liver met.) NCI-N87 


|69.3 


Melanoma* M14 


50.7 


Gastric ca. KATOIII 


55.1 


Melanoma* LOXIMVI 


20.9 


Colon ca. SW-948 


64.2 


Melanoma* SK-MEL-5 


20.6 


Colon ca. SW480 


65.5 


'Squamous cell carcinoma SCC-4 


53.2 


Colon ca.* (SW480 met) SW620 


17.2 


'Testis Pool 


35.1 


Colon ca. HT29 


22.1 ~ 


: Prostate ca.* (bone met) PC -3 


97.9 


Colon ca. HCT-II6 


36.6 


'Prostate Pool 


9.5 


Colon ca. CaCo-2 


26.1 


placenta ;26.2 


Colon cancer tissue 


33.0 
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1 It Aft ip D<~\/-\l 

uterus rooi 


|4.1 


Ir^Urt on c\\/i t i a 

jLoion ca. b W 1 1 lo 


||7.9 


•vjvanan ca. uvlak-j 


jvv._> 


jColon ca. Lolo-205 


17.4 


jUvai tan ca. bis.-uv-3 


A/1 A 


jLoion ca. bw-4o 


25.0 


:v_/vaiian ca. uvlak-h 


JO.O 


IColon Pool 


19.1 


juvai tan ca. uvlako 


£A 1 


Small Intestine Pool 


15.6 


jwvanan ca. iukuv-i 


57.5 


Stomach Pool 


|8.7 


juvarian ca. uvlak-o 




Bone Marrow Pool 


[6.9 


ju'vary 


0.0 


ll?n«-nl Unn^ 

jretal Heart 


18.8 


breast ca. MLr- / 


36.3 


[Heart Pool 


; 1 6.2 


breast ca. lviDA-lvib-z3l 


100.0 


Lymph Node Pool 


14.6 


Draort on D TT C/1H 

breast ca. b i j4v 


79.0 


Fetal Skeletal Muscle 


10.3 


breast ca. I47L) 


18.8 


Skeletal Muscle Pool 


15.9 


breast ca. MUA-N 


60.3 


Spleen Pool 


16.2 


Breast Pool 


17.8 


Thymus Pool 


19.3 


jTrachea 


28.7 


:CNS cancer (glio/astro) U87-MG 


47.6 


jLung 


1.4 


CNS cancer (glio/astro) U-l 18-MG 


1- o 

73.8 


:Fetal Lung 


21.9 


CNS cancer (neuro;inet) SK-N-AS 


38.4 


!l nnn on M/"" 1 KM 1 "7 

;LLing ca. [NLI-N4 I / 


33.7 


CNS cancer (astro) SF-539 


34.6 


jLung ca. LX-I 


22.2 


CNS cancer (astro) SNB-75 


79.6 


'I iinn or, Xl/^ 1 LA \ A C 

;Lungca. NCI-HI4o 


13.3 


CNS cancer (glio) SNB-19 


60.7 


Lung ca. iHr- / / 


33.2 


CNS cancer (glio) SF-295 


40.6 


Lung ca. A549 


24.5 ]Brain (Amygdala) Pool 


20.0 


iLungca. NCJ-H526 


27.0 


Brain (cerebellum) 


r 12.5 


JLung ca. iNci-Hzi 


32.5 |Brain (fetal) 


40.1 


• LUIlgCa. Nl^i-rl4oU 


9.9 j'Brain (Hippocampus) Pool f 1 8.2 


Lung ca. mji -oz 


1 5.8 jCerebral Cortex Pool 


19.1 


.Liing ca. JNUl-rDzz 


54.7 |Brain (Substantia nigra) Pool -21.0 


jLiver 


2 1 .8 (Brain (Thalamus) Pool 


19.2 


!Fetal Liver 


14.3 j 


Brain (whole) 


35.6 


Liverca. HepG2 


19.6 


Spinal Cord Pool 


1 6.5 " 


Kidney Pool 


21.9 


Adrenal Gland 


35.1 


'Fetal Kidney 


14.4 


Pituitary gland Pool 


j.j 


|Renal ca. 786-0 


28.5 


Salivary Gland 


46.7 ""' 


r Renal ca. A 498 


17.9 


Thyroid (female) 


11.4 


: Renal ca. ACHN 1 


18.9 (Pancreatic ca. CAPAN2 j 


51.4 " 


Renal ca. UO-3 1 


24.8 {Pancreas Pool j 


11.0 



General_screening_pancl_vl.6 Summary: Ag6896 Highest expression of this gene 
is seen in a breast cancer cell line (CT=27.8). This gene is ubiqutously expressed in this 
panel, with moderate expression seen in brain, colon, gastric, lung, breast, ovarian, and 
5 melanoma cancer cell lines. This expression profile suggests a role for this gene product in 
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cell survival and proliferation. Modulation of this gene product may be useful in the 
treatment of cancer. 

In addition, this gene is expressed at much higher levels in fetal lung tissue (CT=29.8) 
when compared to expression in the adult counterpart (CT=33.8). Thus, expression of this 
5 gene may be used to differentiate between the fetal and adult source of these tissue 

Among tissues with metabolic function, this gene is expressed at moderate levels in 
pituitary, adipose, adrenal gland, pancreas, thyroid, and adult and fetal skeletal muscle, heart, 
and liver. This widespread expression among these tissues suggests that this gene product 
may play a role in normal neuroendocrine and metabolic function and that disregulated 
10 expression of this gene may contribute to neuroendocrine disorders or metabolic diseases, 
such as obesity and diabetes. 

This gene is also expressed at moderate levels in the CNS, including the 
hippocampus, thalamus, substantia nigra, amygdala, cerebellum and cerebral cortex. 
Therefore, therapeutic modulation of the expression or function of this gene may be useful in 
15 the treatment of neurologic disorders, such as Alzheimer's disease, Parkinson's disease, 
schizophrenia, multiple sclerosis, stroke and epilepsy. 

Z. NOV88a: Copine VII 

Expression of gene NOV88a was assessed using the primer-probe set Ag364 1, 
described in Table ZA. Results of the RTQ-PCR runs are shown in Tables ZB and ZC. 
20 Table ZA. Probe Name Ag3641 



Primers 


Sequences 


Length 


Start Position 


SEQID 
No 


Forward 


5'-tggactattacaatggcaaagg-3' 


22 


1748 


426 


[Probe 


TET-5'-atgaatcttccagcacactagcacca-3'-TAMRA 


26 


1799 


427 


[Reverse 


S'-gtaaaactgtgtggggagttcaO' 


22 


1825 


428 



Table ZB. CNS neurodcgencration vl.O 



! 

Tissue Name 


Rel. 

Exp.(%) 
Ag3641, 
Run 

212315202 


Tissue Name 


:RcL 

|Exp.(%) 
|Ag3641, 
|Run 

1212315202 


AD 1 Hippo 


2.7 


Control (Path) 3 Temporal Ctx 1 1 . 1 


|AD2 Hippo 


6.0 


Control (Path) 4 Temporal Ctx 


III-"" 


AD 3 Hippo 


1.0 


AD 1 Occipital Ctx 


j2.7 


AD 4 Hippo 


0.0 


AD 2 Occipital Ctx (Missing) 


jo.o 
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jAD 5 hippo 




100.0 


|AD 3 Occipital Ctx 




1.0 


IAD 6 Hippo 


24 8 


TaD 4 Occinital Ctx 




7.5 


^Control 2 Hippo 


U.J 


AD S Occinital Ptv 




6.0 


'Control 4 Hippo 


1.0 


IAD 6 Occinital Ctx 




[18.9 


Control (Path) 3 Hippo 


0.3 


IPontrol 1 Occinital PtY 

j V_/W 1 1 L 1 \J I I WW 1 |JI Id 1 I A. 


jo.o 


AD 1 Temporal Ctx 


1.9 


[Control 2 Occinital PtY 






r~" — k - • — 

AD 2 Temporal Ctx 


9 0 


•Control 1 Orrinital PtY 




:4.8 


AD 3 Temporal Ctx 


1 2 


'Control 4 Orrinital PtY 




1.2 


AD 4 Temporal Ctx 
AD 5 Inf Temporal Ctx 




2 7 
j i ,\j 


^Control fPatM 1 Orrinital Ptv 




26.4 . 


'Control TPath^ ? Orrinital Ptv 




1.7 


AD 5 SupTemporal Ctx 


19 6 


'Pontrnl (Path\ 1 Orrinital Ptv 
jV_Uiiuui frailly J v_/t,CipHal K^l\ 




0.4 


AD 6 Inf Temporal Ctx 


7.6 


jControl (Path) 4 Occipital Ctx 




6.0 


AD 6 Sup Temporal Ctx 


13.6 


jControl 1 Parietal Ctx 




tT5 


Control 1 Temporal Ctx 


0.4 


jControl 2 Parietal Ctx 




11.6 


^Control 2 Temporal Ctx 




6.1 


(Control 3 Parietal Ctx 




!4-5 


[Control 3 Temporal Ctx 


4.1 


jControl (Path) 1 Parietal Ctx 




'15.7 


{Control 4 Temporal Ctx 


1.3 


jControl (Path) 2 Parietal Ctx 




12.7 


jControl (Path) 1 Temporal Ctx 


26.6 


jControl (Path) 3 Parietal Ctx 




0.2 


Control (Path) 2 Temporal Ctx 


11.7 


•Control (Path) 4 Parietal Ctx 




15.7 


Table ZC. General screening panel vl.4 




Rel. 


i 




Rel. 


4 


Exp.(%) 


3 
1 




Exp.(%) 


(Tissue Name 


Ag3641, 


jTissue Name 




Ag3641, 


j 
i 


Run 


i 


Run 




218306189 


i 
1 


j218306189 


Adipose 




0 


jRenalca. TK-10 




0.0 


[Melanoma* Hs688(A).T 


1 


1 


Bladder 


jo.o 


(Melanoma* Hs688(B).T 


1.2 


Gastric ca. (liver met.) NCI-N87 




1.7 


(Melanoma* M14 


0.0 


jGastricca. KATO III 


•0.0 


^Melanoma* LOXIMVI 


0.0 


Colon ca. SW-948 




0.0 


[Melanoma* SK-MEL-5 


0.0 


Colon ca. SW480 


Jo.o 


{Squamous cell carcinoma SCC-4 


0.0 


Colon ca.* (SW480 met) SW620 


jo.o 


•Testis Pool 


4.4 


fCoion ca. HT 29 


— 

1 


0.0 


IProstate ca.* (bone met) PC-3 


1.2 


Colon ca. HCT-116 


jo.o 


Prostate Pool 


100.0 


Colon ca. CaCo-2 




o.o 


'Placenta 


0.0 


Colon cancer tissue 


0.0 


j Uterus Pool 


0.0 


Colon ca. SW1116 


0.0 


jOvarian ca. OVCAR-3 


66.9 , 


Colon ca. Colo-205 


0.0 


[Ovarian ca. SK-OV-3 


0.0 


Colon ca. SW-48 


o.6" 


jovarian ca. OVCAR-4 


3.0 


Colon Pool 


0.0 


Ovarian ca. OVCAR-5 


0.0 


Small Intestine Pool 


jo.o 


Ovarian ca. IGROV-1 


14.0 


Stomach Pool 




0.0 
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jOvarian ca. OVCAR-8 j 10.4 




Bone Marrow Pool 


0.0 


Ovary 


|4.8 




iFetal Heart 


r lJ 


( Dieasi ca. iviv^r-/ 


j3.8 




Heart Pool 


l 32 8 


Drcabl Ca. IV|L//\-iVJD"^ J 1 


|l.2 




Lymph Node Pool 


1 s 

I .«/ 


t>reasi ca. t> i D*\y 


jo.o 




Fetal Skeletal Muscle 


0 0 


Dreasi ca. x'+iu 


il-4 


(Skeletal Muscle Pool 


0 0 


Dreasi ca. ivi l//a-in 


10.0 




Spleen Pool 


6 4 


Dreasi rooi 


1.4 


jThymus Pool 


00 

v/.v 


i racnea 


10.7 


jCNS cancer (glio/astro) U87-MG 


00 


Lung 


0.0 


|CNS cancer (glio/astro) U-l 18-MG 


00 


reiai Lung 


1.8 


CNS cancer (neuro;met) SK-N-AS 


0 0 


Lungca. NCI-N417 


0.0 


CNS cancer (astro) SF-539 


0 0 


Lungca. LX-I 


0.0 


jCNS cancer (astro) SNB-75 


0 0 


Lungca. NCI-HI 46 


14.0 


|CNS cancer (glio) SNB-19 


11.3 


ilung ca. SHP-77 


0.0 


iCNS cancer (glio) SF-295 

* ^ i ~ * 


0.0 


jLung ca. A549 


75.3 


•Brain (Amygdala) Pool 


54.3 


jLung ca.NCI-H526 


0.0 


jBrain (cerebellum) 


0.0 


jLungca.NCI-H23 


4.9 


iBrain (fetal) 


54.0 


jLungca. NCI-H460 


5.1 


j 


Brain (Hippocampus) Pool 


59.5 


jLungca. HOP-62 


6.6" "' 


Cerebral Cortex Pool 


82.4 


jLungca. NCI-H522 jO.O 


jBrain (Substantia nigra) Pool 


62.4 


jLiver jo.O 


jBrain (Thalamus) Poo! [92.7 


Fetal Liver jO.O 


iBrain (whole) 


39.0 


Liver ca. HepG2 jO.O 


ISpinal Cord Pool 


17.9 


Kidney Pool 


1.4 


jAdrenal Gland j 


2.6 


Fetal Kidney 


1.9 


i 

! 


Pituitary gland Pool |24.5 


;Renal ca. 786-0 jO.O 


jSaiivary Gland j 


1.6 


[Renal ca. A498 


0.0 


jThyroid (female) jO.O 


Renal ca. ACHM 


0.0 


jPancreatic ca. CAPAN2 j 


2.7 


|Renal ca. UO-3 1 


0.0 


[Pancreas Pool jO.O 



CNS_neurodegeneration_vl.O Summary: Ag364l This panel does not show 
differential expression of this gene in Alzheimer's disease. However, this profile confirms the 
expression of this gene at moderate levels in the brain. Therefore, therapeutic modulation of 
5 the expression or function of this gene may be useful in the treatment of neurological 

disorders, such as Alzheimer's disease, Parkinson's disease, schizophrenia, multiple sclerosis, 
stroke and epilepsy. 

General_screening_panel_vl.4 Summary: Ag364l Highest expression is seen in 
the prostate (CT=33). Prominent expression of this gene is seen in cells lines from ovarian 
10 and lung cancer as well as all regions of the brain in this panel. Therefore, therapeutic 
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modulation of this gene may be useful in the treatment of prostate related diseases, as well as, 
lung and ovarian cancers. 

In addition, moderate to low levels of expression of this gene is also seen in all the 
regions of brain. Therefore, therapeutic modulation of this gene product may be useful in the 
5 treatment of neurological disorders such as Alzheimer's disease, Parkinson's disease, 
epilepsy, multiple sclerosis, schizophrenia and depression. 

AA. NOV89b: intracellular protein 

Expression of full-length physical clone NOV89b was assessed using the primer- 
probe set Ag691 1, described in Table AAA. Results of the RTQ-PCR runs are shown in 
10 Table AAB. 

Tabic AAA. Probe Name Ag6911 



jPrimers 


Sequences 


1 

Length iStart Position 


SEQ ID 

No 


jForward 
jProbe 


5 , -gtattcccaaagaaaggcatctt-3 , 


23 |412 


429 *~ 


TET-5'-atgaaagcaaagccagtgctcccttc-3'-TAMRA 


26 |442 


430 


{Reverse 


5'-gcatgtacagtggccagga-3' 


1 9 I468 


431 



Table AAB. General screening panel vl.6 



i 

i 

Tissue Name 


Rel. 

Exp.(%) 
Ag6911, 
Run 

278388417 


i 

Tissue Name 


jRcl. 

;Exp.(%) 

|A g 6911, 
Run 

278388417 


iAdipose 


9.3 


Renal ca. TK-10 


|63.3 


JMelanoma* Hs688(A).T 


17.1 


Bladder 


; 20.4 


jMelanoma* Hs688(B).T 


18.7 


Gastric ca. (liver met.) NCI-N87 179.6 


Melanoma* MI4 


49.7 


Gastric ca. K.ATO III 


'63.7 


jMelanoma* LOXIMVI 


22.1 


Colon ca. SW-948 


22.4 


[Melanoma* SK-MEL-5 


42.3 


Colon ca. SW480 


40.3 


Squamous cell carcinoma SCC-4 


19.2 jColon ca.* (SW480 met) SW620 


24.7 


Testis Pool 


16.0 


Colon ca. HT29 


21.9 


Prostate ca.* (bone met) PC-3 


13.0 


Colon ca. HCT-116 


46.7 


.Prostate Pool 


17.2 jColonca. CaCo-2 


33.2 


;Piacenta 


7.0 jColon cancer tissue 


14.9 


Ulerus Pool 


'2.4 j 


Colon ca. SWI11 6 


12.3" 


•Ovarian ca. OVCAR-3 


37.9 


Colon ca. Colo-205 


19.5 


bvarian ca. SK-OV-3 


55.5 


Colon ca. SW-48 


14.2 


jOvarian ca. OVCAR-4 


42.0 


Colon Pool 

— > 


15.1 


;Ovarian ca. OVCAR-5 


41.2 


Small Intestine Pool 


9.3 
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.Ovarian ca. IGROV-1 


26.6 


Stomach Pool U.l 


[Ovarian ca. OVCAR-8 


21.8 


Bone Marrow Pool |4.2 


jOvary (5.6 


Fetal Heart i6.4 


Breast ca. MCF-7 ]17.3 


Heart Pool 18.7 


jBreast ca. MDA-MB-23 1 |66.9 


Lymph Node Pool j 1 0.6 


Breast ca. BT 549 100.0 


Fetal Skeletal Muscle |3.2 


{Breast ca. T47D 


17.7 


Skeletal Muscle Pool ~|2/T~ 


!Breast ca. MDA-N 


14.7 


Spleen Pool |6.5 


iBreast Pool 


9.9 


Thymus Pool 


7.5 

43~8 r " 


Trachea 


12.6 


CNS cancer (glio/astro) U87-MG 


Lung 


4.8 


CNS cancer (glio/astro) U-l 18-MG 


29.1 


Fetal Lung 


17.1 


CNS cancer (neuro;met) SK-N-AS 


29.3 


Lung ca. NCI-N4I7 


8.1 


CNS cancer (astro) SF-539 


21.0 


ILungca. LX-1 


39.5 


CNS cancer (astro) SNB-75 


58.2 


;Lung ca. NCI-H146 


10.2 


CNS cancer (glio) SNB-19 


26.1 


[Lungca. SHP-77 


6 1 .6 jCNS cancer (glio) SF-295 


34.2 


jLung ca. A549 


39.2 jBrain (Amygdala) Pool 


2.6 


Lung ca.NCI-H526 


13.4 


Brain (cerebellum) 


3.5 


Lungca.NCI-H23 ]57.8 


Brain (fetal) 


3.1 


Lungca. NCI-H460 jl2.2 


Brain (Hippocampus) Pool 


T7 


Lung ca. HOP-62 


8.8 


Cerebral Cortex Pool 


4.3 


|Lung ca. NCI-H522 


39.8 


Brain (Substantia nigra) Pool 


2.8 


jLiver 


1.3 


Brain (Thalamus) Pool |4.5 


Fetal Liver 


6.4 jBrain (whole) jl.8 


[Liver ca. HepG2 


8.4 


Spinal Cord Pool 13.4 


iKidney Pool 


15.7 jAdrenal Gland <9.7 


jFetal Kidney 


1 1 .3 jPituitary gland Pool 


5.2 


Renal ca. 786-0 


50.7 jSalivary Gland jll.l 


Renal ca. A498 


17.6 jThyroid (female) [10.0 


fcenal ca. ACHN 

i 


26.6 [Pancreatic ca. CAPAN2 126. 1 


IRenalca. UO-31 


44.1 jPancreas Pool |4.9 



General_screening_panel_vl.6 Summary: Ag69l 1 Highest expression of this gene 
is seen in a breast cancer cell line (CT=29.8). This gene is widely expressed in this panel, 
with moderate expression seen in brain, colon, gastric, lung, breast, ovarian, and melanoma 
5 cancer cell lines. This expression profile suggests a role for this gene product in cell survival 
and proliferation. Modulation of this gene product may be useful in the treatment of cancer. 

Among tissues with metabolic function, this gene is expressed at low but significant 
levels in pituitary, adipose, adrenal gland, pancreas, thyroid, fetal liver and adult and fetal 
skeletal muscle and heart. This widespread expression among these tissues suggests that this 
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gene product may play a role in normal neuroendocrine and metabolic function and that 
disregulated expression of this gene may contribute to neuroendocrine disorders or metabolic 
diseases, such as obesity and diabetes. 

This gene is also expressed at low but significant levels in the CNS. including the 
5 hippocampus, thalamus, substantia nigra, cerebellum and cerebral cortex. Therefore, 
therapeutic modulation of the expression or function of this gene may be useful in the 
treatment of neurologic disorders, such as Alzheimer's disease, Parkinson's disease, 
schizophrenia, multiple sclerosis, stroke and epilepsy. 

AB. NOV91c and NOV91b: FIP-2 like 

10 Expression of full-length physical clones NOV9lc and variant NOV9lb was assessed 

using the primer-probe set Ag6I62, described in Table ABA. 

Table ABA. Probe Name Ag6162 



[ 

iPrimers 


Sequences 


Length jStart Position 


SEQ ID 
No 


Forward 


5 , -ttgtgtgtcatctgtagcacagtta-3 ' 


25 |746 


432 


Probe 


TET-5'-tggacttttcatcctctgttttagcc-3'-TAMRA 


26 |774 


433 


Reverse 


5'-gctatcagaaatcaaaatggaaca-3' 


24 ^ =800 


434 



15 

Example D: Identification of Single Nucleotide Polymorphisms in NOVX nucleic acid 
sequences 

Variant sequences are also included in this application. A variant sequence can 
include a single nucleotide polymorphism (SNP). A SNP can, in some instances, be referred 

20 to as a "cSNP" to denote that the nucleotide sequence containing the SNP originates as a 

cDNA. A SNP can arise in several ways. For example, a SNP may be due to a substitution of 
one nucleotide for another at the polymorphic site. Such a substitution can be either a 
transition or a transversion. A SNP can also arise from a deletion of a nucleotide or an 
insertion of a nucleotide, relative to a reference allele. In this case, the polymorphic site is a 

25 site at which one allele bears a gap with respect to a particular nucleotide in another allele. 
SNPs occurring within genes may result in an alteration of the amino acid encoded by the 
gene at the position of the SNP. Intragenic SNPs may also be silent, when a codon including 
a SNP encodes the same amino acid as a result of the redundancy of the genetic code. SNPs 
occurring outside the region of a gene, or in an intron within a gene, do not result in changes 

30 in any amino acid sequence of a protein but may result in altered regulation of the expression 
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pattern. Examples include alteration in temporal expression, physiological response 
regulation, cell type expression regulation, intensity of expression, and stability of transcribed 
message. 

SeqCalling assemblies produced by the exon linking process were selected and 
5 extended using the following criteria. Genomic clones having regions with 98% identity to 
all or part of the initial or extended sequence were identified by BLASTN searches using the 
relevant sequence to query human genomic databases. The genomic clones that resulted were 
selected for further analysis because this identity indicates that these clones contain the 
genomic locus for these SeqCalling assemblies. These sequences were analyzed for putative 
1 0 coding regions as well as for similarity to the known DNA and protein sequences. Programs 
used for these analyses include Grail, Genscan, BLAST, HMMER, FASTA, Hybrid and other 
relevant programs. 

Some additional genomic regions may have also been identified because selected 
SeqCalling assemblies map to those regions. Such SeqCalling sequences may have 

1 5 overlapped with regions defined by homology or exon prediction. They may also be included 
because the location of the fragment was in the vicinity of genomic regions identified by 
similarity or exon prediction that had been included in the original predicted sequence. The 
sequence so identified was manually assembled and then may have been extended using one 
or more additional sequences taken from CuraGen Corporation's human SeqCalling database. 

20 SeqCalling fragments suitable for inclusion were identified by the CuraTools™ program 

SeqExtend or by identifying SeqCalling fragments mapping to the appropriate regions of the 
genomic clones analyzed. 

The regions defined by the procedures described above were then manually integrated 
and corrected for apparent inconsistencies that may have arisen, for example, from miscalled 

25 bases in the original fragments or from discrepancies between predicted exon junctions, EST 
locations and regions of sequence similarity, to derive the final sequence disclosed herein. 
When necessary, the process to identify and analyze SeqCalling assemblies and genomic 
clones was reiterated to derive the full length sequence (Alderborn et al., Determination of 
Single Nucleotide Polymorphisms by Real-time Pyrophosphate DNA Sequencing. Genome 

30 Research. 10(8) 1249-1265,2000). 

Variants are reported individually but any combination of all or a select subset of 
variants are also included as contemplated NOVX embodiments of the invention. 
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NOVlaSNP data: 

NOV la has 4 SNP variants, whose variant positions for its nucleotide and amino acid 
sequences is numbered according to SEQ ID NOs: 1 and 2, respectively. The nucleotide 
sequence of the NOV 1 a variants differ as shown in Table DA. 

5 



Tabic DA. cSNP and Coding Variants for NOVla 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380191 


87 


G 


A 


6 


Ser 


Asn 


13380274 


155 


T 


C 


29 


Ser 


Pro 


13380190 


186 


C 


T 


39 


Ser 


Leu 


13380275 


200 


C 


T 


44 


His 


Tyr 



NOV2a SNP data: 

NOV2a has 1 SNP variant, whose variant positions for its nucleotide and amino acid 
10 sequences is numbered according to SEQ ID NOs: 3 and 4, respectively. The nucleotide 
sequence of the NOV2a variants differ as shown in Table DB. 



Table DB. cSNP and Coding Variants for NOV2a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380382 


1046 


G 


T 


307 


Val 


Val 



15 NOV6a SNP data: 

NOV6a has 3 SNP variants, whose variant positions for its nucleotide and amino acid 
sequences is numbered according to SEQ ID NOs: 19 and 20, respectively. The nucleotide 
sequence of the NOV6a variants differ as shown in Table DC. 



Tabic DC. cSNP and Coding Variants for NOV6a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


initial 


Modified 


13380329 


73 


C 


T 


17 


Pro 


Ser 
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13380317 


180 


C 


T 


52 


Cys 


Cys 


13380318 


323 


T 


C 


100 


Val 


Ala 



NOV7a SNP data: 

N0V7a has 2 SNP variants, whose variant positions for its nucleotide and amino acid 
5 sequences is numbered according to SEQ ID NOs:23 and 24, respectively. The nucleotide 
sequence of the NOV7a variants differ as shown in Table DD. 



Table DD. cSNP and Coding Variants for NOV7a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380328 


881 


A 


G 


283 


Thr 


Ala 


13380327 


1429 


G 


A 


465 


Glu 


Glu 



10 NO V8a SNP data: 

NOV8a has 8 SNP variants, whose variant positions for its nucleotide and amino acid 
sequences is numbered according to SEQ ID NOs: 33 and 34, respectively. The nucleotide 
sequence of the NOV8a variants differ as shown in Table DE. 



Table DE. cSNP and Coding Variants for NOV8a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380302 


360 


T 


C 


8 


Asp 


Asp 


13380301 


545 


A 


G 


70 | 


Asn 


Ser 


13380300 


677 


G 


A 


114 


Arg 


Lys 


13380299 


1433 


A 


C 


0 






13380298 


1470 


C 


T 


0 






13380297 


1711 


G 


C 


o i 






13380296 1 


1717 


A 


G 


0 i 






13380295 


1925 


C 


T 


0 
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NOV9bSNPdata: 

NOB9b has 7 SNP variants, whose variant positions for its nucleotide and amino acid 
sequences is numbered according to SEQ ID NOs: 37 and 38, respectively. The nucleotide 
sequence of the NOV9b variants differ as shown in Table DF. 

5 



Table DF. cSNP and Coding Variants for NOV9b 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380197 


47 


C 


T 


16 


Pro 


Leu 


13380196 


258 


G 


A 


86 


GIu 


GIu 


13375934 


309 


C 


T 


103 


Ala 


Ala 


13375935 


340 


G 


A 


114 


Ala 


Thr 


13375936 


393 


T 


C 


131 


Asp 


Asp 


13380199 


435 


G 


A 


145 


Leu 


Leu ,) 


13375938 


457 


G 


A 


153 


Gly 


Ser 



NOVlOa SNP data: 

NOVlOa has 24 SNP variants, whose variant positions for its nucleotide and amino 
10 acid sequences is numbered according to SEQ ID NOs: 39 and 40, respectively. The 
nucleotide sequence of the NOVlOa variants differ as shown in Table DG. 



Tabic DG. cSNP and Coding Variants for NOVlOa 



Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380337 


63 


G 


C 


6 


Arg 


Pro 


13380342 


220 


C 


T 


58 


Asn 


Asn 


13380343 


282 


G 


A 


79 


Ser 


Asn 


13380344 


299 


C 


T 


85 


Arg 


Tip ; 


13380345 


303 


G 


A 


86 


Arg 


Lys 


13380346 


345 


C 


T 


100 


Pro 


Leu 


13380347 


362 


A 


G 


106 


Ser 


Gly 


13380348 


445 


T 


C 


133 


lie 


lie 


13380349 


520 


G ; 


A 


158 


GIu 


GIu 



558 



WO 03/023002 




-CT/US02/28539 



1 1 >80"5SO 




p. 


T 


170 


Asp 


Tvr 

lyr 


1 'HBOIS 1 

1 JJOUJJ 1 




p 


T 
1 


171 


Thr 
1 111 


Thr 
1 111 


1 T5R01S7 


S7fi 
J /o 




p 


1 77 
i / / 


Pi In 
vj 111 


Arg 


1 T*RO**^ 




A 
A 


p 


1 


Pin 


PUi 




A 70 
0 /U 


p 


1 


70° 
ZUo 


A la 

Ala 


Aia 




7^7 
111 


A 


p 


77 1 
Zj 1 


lie 


vai 


1 JjoUJ DO 


777 

/ /Z 


P 


! 


7/17 
Z4Z 


Asn 


Asn 


1 jjOUjj / 


77"? 


p 


1 


7/11 
Z*+J 


Arg 




133803S8 


OOVJ 


T 
i 


P 




A ro 


A ro 

Arg 


13380307 


1120 


c 


T 


358 


Thr 


Thr 


13380168 


1157 


c 


T 


371 


Leu 


Phe 


13380167 


1265 


c i 


T 


407 


Leu 


Leu 


13380305 


1273 ; 


T | 


C 


409 


His 


His 


13380306 


1307 


c 


T 


0 






13380359 


1309 


G 


A 


0 







5 NOVllaSNPdata: 

NOV1 la has 3 SNP variants, whose variant positions for its nucleotide and amino 
acid sequences is numbered according to SEQ ID NOs: 43 and 44, respectively. The 
nucleotide sequence of the NOV1 la variants differ as shown in Table DH. 



Table DH. cSNP and Coding Variants for NOVlla 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380247 


57 


T 


C 


0 






13380230 


264 


C 


A 


50 


Thr 


Thr 


13380304 


640 


A 


G 


176 


Met 


Val 
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NOV12bSNPdata: 

NOV 12b has I SNP variant, whose variant positions for its nucleotide and amino acid 
sequences is numbered according to SEQ ID NOs: 47 and 48, respectively. The nucleotide 
sequence of the NOV 1 2b variants differ as shown in Table DI. 

5 



Table DI. cSNP and Coding Variants for NOV12b 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380285 


366 


C 


T 


0 







NOV13a SNP data: 

NOV 1 3a has 2 SNP variants, whose variant positions for its nucleotide and amino 
1 0 acid sequences is numbered according to SEQ ID NOs: 49 and 50, respectively. The 
nucleotide sequence of the NOV 1 3a variants differ as shown in Table DJ. 



Table DJ. cSNP and Coding Variants for NOV13a 








Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380334 


91 


C 


A 


16 


Ala 


Glu 


13380333 


997 


C 


T 


318 


Ser 


Phe 



15 NOV14a SNP data: 

NOV 1 4a has 3 SNP variants, whose variant positions for its nucleotide and amino 
acid sequences is numbered according to SEQ ID NOs: 5 1 and 52, respectively. The 
nucleotide sequence of the NOV 1 4a variants differ as shown in Table DK. 



Table DK. cSNP and Coding Variants for NOV14a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position I 


Initial 


Modified 


13380308 


147 


T 


C 


44 


Val 


Ala 


13380309 


597 


C 


T 


194 


Pro 


Leu 


13380310 


786 


T 


C 


257 


Leu 


Pro 
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NOV16a SNP data: 

NOV 16a has 5 SNP variants, whose variant positions for its nucleotide and amino 
acid sequences is numbered according to SEQ ID NOs: 57 and 58, respectively. The 
5 nucleotide sequence of the NOV 16a variants differ as shown in Table DL. 



Table DL. cSNP and Coding Variants for NOV16a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380246 


575 


A 


G 


176 


Asn 


Ser 


13380245 


603 


C 


T 


185 


Gly 


Gly .< 


13380239 


1175 


C 


A 


376 


Pro 


Gin 


13380238 


1253 


G 


A 


402 


Scr 


Asn 


13380237 


1390 


T 


G 


448 


Leu 


Val 



NO VI 8a SNP data: 

1 0 NOV 1 8a has 5 SNP variants, whose variant positions for its nucleotide and amino 

acid sequences is numbered according to'SEQ ID NOs: 61 and 62, respectively. The 
nucleotide sequence of the NOV 1 8a variants differ as shown in Table DM. 



Table DM. cSNP and Coding Variants for NOV18a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380324 


42 


T 


G 


0 






13380323 


163 


A 


G 


32 


Thr 


Ala 


13380289 


3363 


A 


G 


1098 


Glu 


Glu 


13380226 


3489 


C 


T 


1140 


Ser 


Ser 


13380227 


3782 


C 


A 


1238 


Thr 


Lys 



NOV19a SNP data: 

NOV 19a has 2 SNP variants, whose variant positions for its nucleotide and amino 
acid sequences is numbered according to SEQ ID NOs: 67 and 68, respectively. The 
nucleotide sequence of the NOV 1 9a variants differ as shown in Table DN. 
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Table DN. cSNP and Coding Variants for NOV19a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380176 


260 


G 


A 


69 


Ser 


Ser 


13380177 


371 


T 


C 


106 


He 


lie 



NOV20a SNP data: 

5 NOV20a has 4 SNP variants, whose variant positions for its nucleotide and amino 

acid sequences is numbered according to SEQ ID NOs: 71 and 72, respectively. The 
nucleotide sequence of the NOV20a variants differ as shown in Table DO. 



Table DO. cSNP and Coding Variants for NOV20a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380380 


147 


A 


G 


34 


Pro 


Pro 


13380381 


379 ■ 


C 


T 


112 


Pro 


Ser 


13380325 


2939 


A 


C 


965 


His 


Pro 


13380331 


4433 


C 


T 


1463 


Ala 


Val 



NOV21a SNP data: 

NOV21a has 9 SNP variants, whose variant positions for its nucleotide and amino 
acid sequences is numbered according to SEQ ID NOs: 85 and 86, respectively. The 
nucleotide sequence of the NOV2 1 a variants differ as shown in Table DP. 



10 



15 



Table DP. cSNP and Coding Variants for NOV21a 


Variant 


Nucleotides 


Amino Acids 


Position 


initial 


Modified 


Position 


Initial 


Modified 


13380404 


75 


C 


A 


23 


Pro 


Thr 


13380403 


155 


C 


A 


49 


Pro 


Pro 
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A 

r\ 


p. 


Ozl 


Asn 


oer 


13380401 


292 


A 


G 


95 


Asp 


Gly 


13380400 


331 


C 


T 


108 


Ser 


Leu 


13380399 


363 


T 


C 


119 


Tyr 


His 


13380398 


370 


G 


A 


121 


Trp 


End 


13380397 


536 


G 


C 


176 


Val 


Val 


13380396 


548 


C 


G 


180 


Tyr 


End 



NOV22a SNP data: 

NOV22a has 2 SNP variants, whose variant positions for its nucleotide and amino 
5 acid sequences is numbered according to SEQ ID NOs: 89 and 90, respectively. The 
nucleotide sequence of the NOV22a variants differ as shown in Table DQ. 



Table DQ. cSNP and Coding Variants for NOV22a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380210 


172 


T 


A 


48 


Cys 


Ser 


13380209 


342 


A 


C 


104 


Ser 


Ser 



10 NOV23a SNP data: 

NOV23a has 2 SNP variants, whose variant positions for its nucleotide and amino 
acid sequences is numbered according to SEQ ID NOs: 91 and 92, respectively. The 
nucleotide sequence of the NOV23a variants differ as shown in Table DR. 



Table DR. cSNP and Coding Variants for NOV23a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380218 


647 


T 


C 


143 


Thr 


Thr 


13380219 


730 


C 


T 


171 


Thr 


lie 



15 
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NOV24aSNP data: 

NOV24a has 4 SNP variants, whose variant positions for its nucleotide and amino 
acid sequences is numbered according to SEQ ID NOs: 95 and 96, respectively. The 
nucleotide sequence of the NOV24a variants differ as shown in Table DS. 

5 



Tabic DS. cSNP and Coding Variants for NOV24a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380312 


742 


G 


A 


242 


Glu 


Lys 


13380313 


960 


C 


G 


314 


Thr 


Thr 


13380314 


1144 


G 


C 


0 






13380315 


1462 " 


G 


C 


0 







NOV25a SNP data: 

O 

NOV25a has 3 SNP variants, whose variant positions for its nucleotide and amino 
acid sequences is numbered according to SEQ ID NOs: 97 and 98, respectively. The 
10 nucleotide sequence of the NOV25a variants differ as shown in Table DT. 



Tabic DT. cSNPand Coding Variants for NOV25a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380290 


1958 


A 


G 


568 


Ala 


Ala 


13380291 


2017 


A 


C 


588 


Asp 


Ala 


13380292 


2094 


C 


T 


0 







NOV27aSNP data: 

NOV27a has 2 SNP variants, whose variant positions for its nucleotide and amino 
15 acid sequences is numbered according to SEQ ID NOs: 107 and 108, respectively. The 
nucleotide sequence of the NOV27a variants differ as shown in Table DU. 



Table DU. cSNP and Coding Variants for NOV27a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380387 


112 


A 


G 


0 







564 



WO 03/023002 




'CT/US02/28539 



13380385 


948 


A 




273 


Pro 


Pro 



NOV28a SNP data: 

NOV28a has 2 SNP variants, whose variant positions for its nucleotide and amino 
5 acid sequences is numbered according to SEQ ID NOs: 1 1 1 and 1 1 2, respectively. The 
nucleotide sequence of the NOV28a variants differ as shown in Table DV. 



Table DV. cSNP and Coding Variants for NOV28a 


Variant 


Nucieotides 


Amino Acids 


Position 


■Initial 


Modified 


Position 


Initial 


Modified j 


13380287 


265 


A 


C 


25 


Ala 


Ala 


13380288 


736 


G 


T 


0 







10 NOV29a SNP data: 

NOV29a has 3 SNP variants, whose variant positions for its nucleotide and amino 
acid sequences is numbered according to SEQ ID NOs: 123 and 124, respectively. The 
nucleotide sequence of the NOV29a variants differ as shown in Table DW. 



Tabic DW. cSNP and Coding Variants for NOV29a 



Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13377405 


1685 


G 


A 


555 


Arg 


Gin 


13380222 


2304 


T 


c i 


0 






13380223 


2432 


G 


A 


0 







15 

NOV31a SNP data: 

NOV3la has I SNP variant, whose variant positions for its nucleotide and amino acid 
sequences is numbered according to SEQ ID NOs: 129 and 130, respectively. The nucleotide 
20 sequence of the NOV3 la variants differ as shown in Table DX. 



Table DX. cSNP and Coding Variants for NOV31a 


Variant 


Nucleotides 


Amino Acids 
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Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380276 


441 


C 


T 1 


0 







NOV32b SNP data: 

NOV32b has 2 SNP variants, whose variant positions for its nucleotide and amino 
5 acid sequences is numbered according to SEQ ID NOs: 133 and 1 34. respectively. The 
nucleotide sequence of the NOV32b variants differ as shown in Table DY. 



Tabic DY. cSNP and Coding Variants for NOV32b 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380407 


1810 


G 


A 


311 


Ala 


Thr 


13375612 


2273 


C 


T 


465 


Thr 


Met 



10 NOV34a SNP data: 

NOV34a has 2 SNP variants, whose variant positions for its nucleotide and amino 
acid sequences is numbered according to SEQ ID NOs: 137 and 138, respectively. The 
nucleotide sequence of the NOV34a variants differ as shown in Table DZ. 



Table DZ. cSNP and Coding Variants for NOV34a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380213 


635 


G 


T 


205 


Cys 


Phe 


13380211 


693 


T 


C 


224 


His 


His 



15 

NOV36b SNP data: 

NOV36b has 3 SNP variants, whose variant positions for its nucleotide and amino 
acid sequences is numbered according to SEQ ID NOs: 147 and 148, respectively. The 
20 nucleotide sequence of the NOV36b variants differ as shown in Table DAA. 



Tabic DAA. cSNP and Coding Variants for NOV36b 


Va riant 


Nucleotides 


Amino Acids 
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Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380589 


701 


A 


G 


229 


Gin 


Arg 


13380590 


1413 


T 


C 


466 


Gly 


Gly 


13380591 


2205 


G 


A 


730 


Gin 


Gin 



NOV37b SNP data: 

NOV37b has 2 SNP variants, whose variant positions for its nucleotide and amino 
5 acid sequences is numbered according to SEQ ID NOs: 151 and 152, respectively. The 
nucleotide sequence of the NOV37b variants differ as shown in Table DAB. 



Table DAB. cSNP and Coding Variants for NOV37b 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380597 


157 


G 


T 


0 






13380626 


1440 


T 


C 


416 


lie 


Thr 



10 

NOV38a SNP data: 

NOV38a has 2 SNP variants, whose variant positions for its nucleotide and amino 
acid sequences is numbered according to SEQ ID NOs: 155 and 156, respectively. The 
nucleotide sequence of the NOV38a variants differ as shown in Table DAC. 

15 



Table DAC. cSNP and Coding Variants for NOV38a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380602 


201 


A 


G 


56 


Thr 


Thr 


13380596 


285 


A 


T 


84 


Ser 


Ser 



NOV41aSNP data: 

NOV41a has 4 SNP variants, whose variant positions for its nucleotide and amino 
20 acid sequences is numbered according to SEQ ID NOs: 161 and 162, respectively. The 
nucleotide sequence of the NOV4 la variants differ as shown in Table DAD. 
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Table DAD. cSNP and Coding Variants for NOV41a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380621 


1499 


A 


C 


494 


Thr 


Thr 


13380620 


1605 


G 


A 


530 


Glu 


Lys 


13380619 


1649 


T 


C 


544 


Val 


Val 


13380618 


2071 


T 


C 


0 







NOV42a SNP data: 

NOV42a has 1 SNP variant, whose variant positions for its nucleotide and amino acid 
5 sequences is numbered according to SEQ ID NOs: 163 and 164, respectively. The nucleotide 
sequence of the NOV42a variants differ as shown in Table DAE. 



Table DAE. cSNP and Coding Variants for NOV42a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380613 


2607 


G 


A 


841 


Ala 


Thr 



NOV43a SNP data: 

10 NOV43a has 2 SNP variants, whose variant positions for its nucleotide and amino 

acid sequences is numbered according to SEQ ID NOs: 1 67 and 1 68, respectively. The 
nucleotide sequence of the NOV43a variants differ as shown in Table DAF. 



Table DAF. cSNP and Coding Variants for NOV43a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380595 


261 


C 


T 


82 


Tyr 


Tyr 


13380625 


2989 


T 


C 


992 


Tyr 


His 



15 NOV44c SNP data: 

NOV44c has I SNP variant, whose variant positions for its nucleotide and amino acid 
sequences is numbered according to SEQ ID NOs: 173 and 174, respectively. The nucleotide 
sequence of the NOV44c variants differ as shown in Table DAG. 
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Tabic DAG. cSNP and Coding Variants for NOV44c 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380603 


246 


G 


A 


73 


Gly 


Ser 



NOV45aSNPdata: 

NOV45a has 1 SNP variant, whose variant positions for its nucleotide and amino acid 
sequences is numbered according to SEQ ID NOs: 175 and 176, respectively. The nucleotide 
5 sequence of the NOV45a variants differ as shown in Table DAH. 



Table DAH. cSNP and Coding Variants for NOV45a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380607 


1307 


G 


T 


436 


Ser 


lie ; 



NOV46a SNP data: 

NOV46a has 3 SNP variants, whose variant positions for its nucleotide and amino 
1 0 acid sequences is numbered according to SEQ ID NOs: 1 77 and 1 78, respectively. The 
nucleotide sequence of the NOV46a variants differ as shown in Table DAL 



Table DAI. cSNP and Coding Variants for NOV46a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380612 


815 


T 


C 


231 


Leu 


Ser 


13380611 


1028 


C 


T 


302 


Pro 


Leu 


13380609 


1815 


T 


G 


0 







NOV47a SNP data: 

1 5 NOV47a has 9 SNP variants, whose variant positions for its nucleotide and amino 

acid sequences is numbered according to SEQ ID NOs: 1 79 and 1 80, respectively. The 
nucleotide sequence of the NOV47a variants differ as shown in Table DAJ. 



Tabic DAJ. cSNP and Coding Variants for NOV47a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 
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13380423 


271 


A 


G 


69 


Lys 


Glu 


13380419 


1869 


G 


A 


601 


Met 


He 


1 J 1 0 


1 Q*) 1 

1 vz 1 


A 

A 


cl 


0 1 V 


Arg 


Oly 


13380417 


1939 


G 


A 


625 


Asp 


Asn 


13380416 


1960 


C 


G 


632 


Pro 


Ala 


13380414 


2094 


C 


T 


676 


Pro 


Pro 


1338041 1 


2146 


G 


a ; 


694 


Gly 


Arg 


13380409 


2317 


A 


G 


751 


Thr 


Ala 


13380408 


2615 


G 


T 


0 







NOV48b SNP data: 

NOV48b has 3 SNP variants, whose variant positions for its nucleotide and amino 
acid sequences is numbered according to SEQ ID NOs: 183 and 1 84, respectively. The 
5 nucleotide sequence of the NOV48b variants differ as shown in Table DAK. ^ 



Table DAK. cSNP and Coding Variants for NOV48b 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13377406 


274 


A 


G 


23 


lie 


Val 


13380605 


294 


G 


C 


29 


Arg 


Arg 


13377408 


821 


A 


G 


205 


Glu 


Gly 



NOV49a SNP data: 

NOV49a has 2 SNP variants, whose variant positions for its nucleotide and amino 
10 acid sequences is numbered according to SEQ ID NOs: 185 and 186, respectively. The 
nucleotide sequence of the NOV49a variants differ as shown in Table DAL. 



Tabic DAL. cSNP and Coding Variants for NOV49a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380598 ' 


806 


T 


C 


269 


Val 


Ala 


13380599 


878 


G 


A 


293 


Arg 


His 
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NOVSOa SNP data: 

NOV50a has 2 SNP variants, whose variant positions for its nucleotide and amino 
acid sequences is numbered according to SEQ ID NOs: 189 and 190, respectively. The 
nucleotide sequence of the NOVSOa variants differ as shown in Table DAM. 

5 



Table DAM. cSNP and Coding Variants for NOV50a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380616 


85 


A 


G 


17 


Asp 


Gly 


13380614 


576 


C 


T 


181 


Pro 


Ser 



NOV54a SNP data: 

NOV54a has 2 SNP variants, whose variant positions for its nucleotide and amino 
10 acid sequences is numbered according to SEQ ID NOs: 203 and 204, respectively. The 
nucleotide sequence of the NOV54a variants differ as shown in Table DAN. 



Table DAN. cSNP and Coding Variants for NOV54a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380622 


1971 


G 


C 


657 


Ala 


Ala 


13380594 


3003 


C 


T 


1001 


Asp 


Asp 



15 NOV52a SNP data: 

NOV52a has I SNP variant, whose variant positions for its nucleotide and amino acid 
sequences is numbered according to SEQ ID NOs: 197 and 198, respectively. The nucleotide 
sequence of the NOV52a variants differ as shown in Table DAO. 



Table DAO. cSNP and Coding Variants for NOV52a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380627 


561 


C 


T 


183 


Ala 


Val 



20 
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NOV61aSNP data: 

NOV61a has I SNP variant, whose variant positions for its nucleotide and amino acid 
sequences is numbered according to SEQ ID NOs: 227 and 228, respectively. The nucleotide 
sequence of the NOV61a variants differ as shown in Table DAP. 

5 



Table DAP. cSNP and Coding Variants for NOV61a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13379675 


261 


T 


A 


85 


Phe 


He 



NOV62a SNP data: 

NOV62a has 2 SNP variants, whose variant positions for its nucleotide and amino 
10 acid sequences is numbered according to SEQ ID NOs: 229 and 230, respectively. The 
nucleotide sequence of the NOV62a variants differ as shown in Table DAQ. 



Tabic DAQ. cSNP and Coding Variants for NOV62a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380689 


236 


T 


C 


55 


Cys 


Arg 


13380630 


340 


T 


C 


89 


Asp 


Asp 



15 NOV66a SNP data: 

NOV66a has 2 SNP variants, whose variant positions for its nucleotide and amino 
acid sequences is numbered according to SEQ ID NOs: 241 and 242, respectively. The 
nucleotide sequence of the NOV66a variants differ as shown in Table DAR. 



Table DAR. cSNP and Coding Variants for NOV66a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380657 


1659 


A 


G 


551 


Gin 


Arg 


13380658 


1753 


G 


A 


582 


Leu 


Leu 



20 
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NOV67aSNPdata: 

NOV67a has 1 SNP variant, whose variant positions for its nucleotide and amino acid 
sequences is numbered according to SEQ ID NOs: 245 and 246. respectively. The nucleotide 
sequence of the NOV67a variants differ as shown in Table DAS. 

5 



Tabic DAS. cSNP and Coding Variants for NOV67a 


Variant j 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380649 


1293 


C 


A 


427 


Ala 


Ala 



NO V7 la SNP data: 

« 

NOV71a has I SNP variant, whose variant positions for its nucleotide and amino acid 
sequences is numbered according to SEQ ID NOs: 253 and 254, respectively. The nucleotide 
10 sequence of the NOV7 1 a variants differ as shown in Table DAT. 



Table DAT. cSNP and Coding Variants for NOV71a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380681 


4458 


T 


C 


1484 


Arg 


Arg 



NOV72a SNP data: 

NOV72a has 1 SNP variant, whose variant positions for its nucleotide and amino acid 
1 5 sequences is numbered according to SEQ ID NOs: 255 and 256 ; respectively. The nucleotide 
sequence of the NOV72a variants differ as shown in Table DAU. 



Table DAU. cSNP and Coding Variants for NOV72a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380650 


3106 


C 


T 


1005 


Leu 


Leu 



NOV74a SNP data: 

20 NOV74a has 3 SNP variants, whose variant positions for its nucleotide and amino 

acid sequences is numbered according to SEQ ID NOs: 265 and 266, respectively. The 
nucleotide sequence of the NOV74a variants differ as shown in Table DA V. 
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Tabic DAV. cSNP and Coding Variants for NOV74a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380635 


5439 


A 


G 


1361 


Glu 


Gly 


13380634 


5825 


A 


G 


1490 


Thr 


Ala 


13380633 


11081 


G 


C 


0 







NOV77a SNP data: 

NOV77a has 4 SNP variants, whose variant positions for its nucleotide and amino 
5 acid sequences is numbered according to SEQ ID NOs: 271 and 272, respectively. The 
nucleotide sequence of the NOV77a variants differ as shown in Table DAW. 



Table DAW. cSNP and Coding Variants for NOV77a 



Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


1 3380655 


414 


G 


A 


31 


Val 


lie 


13380654 


955 


A 


G 


211 


Gin 


Arg 


13380653 


971 


C 


T 


216 


Asp 


Asp 


1 3380643 


1800 


A 


G 


493 


Ser 


Gly 



10 NOV80a SNP data: 

NOV80a has 1 SNP variant, whose variant positions for its nucleotide and amino acid 
sequences is numbered according to SEQ ID NOs: 277 and 278, respectively. The nucleotide 
sequence of the NOV80a variants differ as shown in Table DAX. 



Table DAX, cSNP and Coding Variants for NOV80a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380632 


940 


T 


G 


0 







15 
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NOV81bSNPdata: 

NOV8lb has I SNP variant, whose variant positions for its nucleotide and amino acid 
sequences is numbered according to SEQ ID NOs: 281 and 282, respectively. The nucleotide 
sequence of the NOV81 b variants differ as shown in Table DAY. 

5 



Table DAY. cSNP and Coding Variants for NOV81b 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380667 


385 


C 


T 


9 


Arg 


Trp 



NOV82b SNP data: 

NOV82b has 3 SNP variants, whose variant positions for its nucleotide and amino 
10 acid sequences is numbered according to SEQ ID NOs: 285 and 286, respectively. The 
nucleotide sequence of the NOV82b variants differ as shown in Table DAZ. 



Table DAZ. cSNP and Coding Variants for NOV82b 



Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380679 


218 


G 


A 


67 


Ala 


Thr 


13380641 


263 


A 


G 


82 


Lys 


Glu 


13380644 


307 


T 


C 


96 


Leu 


Leu 



15 NOV83b SNP data: 

NOV83b has I SNP variant, whose variant positions for its nucleotide and amino acid 
sequences is numbered according to SEQ ID NOs: 291 and 292, respectively. The nucleotide 
sequence of the NOV83b variants differ as shown in Table DBA. 



Table DBA. cSNPand Coding Varinnts for NOV83b 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13377534 1 


73 


C 


T 


0 







20 
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NOV84a SNP data: 

NOV84a has 3 SNP variants, whose variant positions for its nucleotide and amino 
acid sequences is numbered according to SEQ ID NOs: 293 and 294, respectively. The 
nucleotide sequence of the NOV84a variants differ as shown in Table DBB. 

5 



Table DBB. cSNP and Coding Variants for NOV84a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13380665 


1934 


G 


A 


632 


Asp 


Asn 


13380663 


2176 


T 


C 


712 


Ser 


Ser 


13380662 


2544 


C 


T 


835 


Ala 


Val 



NOV86b SNP data: 

NOV86b has 6 SNP variants, whose variant positions for its nucleotide and amino 
10 acid sequences is numbered according to SEQ ID NOs: 305 and 306, respectively. The 
nucleotide sequence of the NOV86b variants differ as shown in Table DBC. 



Table DBC. cSNP and Coding Variants for NOV86b 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13377818 


97 


A 


G 


33 


Ser 


Gly 


13377817 


867 


A 


C 


289 


Gin 


His 


13377816 


909 


A 


G i 


303 


lie 


Met 


13377815 


919 


C 


T 


307 


Pro 


Ser 


13377814 


926 


T 


A 


309 


Leu 


His 


13377813 


1122 


T 


C 


374 


Ser 


Ser 



15 NOV87a SNP data: 

NOV87a has 6 SNP variants, whose variant positions for its nucleotide and amino 
acid sequences is numbered according to SEQ ID NOs: 307 and 308, respectively. The 
nucleotide sequence of the NOV87a variants differ as shown in Table DBD. 
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Table DBD. cSNP and Coding Variants for NOV87a 


Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13375628 


421 


T 


C 


51 


Ser 


Pro 


13375629 


557 


G 


A 


96 


Gly 


Asp 


13375630 | 


562 


T 


C 


98 


Ser 


Pro 


13380645 


916 


A 


G 


0 






13380646 


1079 


A ] 


G j 


0 






13380647 


1 134 


A 


G 


0 







NOV88a SNP data: 

NOV88a has 5 SNP variants, whose variant positions for its nucleotide and amino 
5 acid sequences is numbered according to SEQ ID NOa: 317 and 318, respectively. The 
nucleotide sequence of the NOV88a variants differ as shown in Table DBE. 



Table DBE. cSl 


VP and Coding Variants for NOV88a 


.Variant 


Nucleotides 


Amino Acids 


Position 


Initial 


Modified 


Position 


Initial 


Modified 


13379305 


583 


G 


A 


144 


Ala 


Thr 


13379294 


938 


A 


G 


262 


Gin 


Arg 


13379295 


1195 


A 


G 


348 


lie 


Val 


13379304 


1367 


T 


C 


405 


Leu 


Pro 


13379296 


1757 


A 


G 1 


535 


Tyr 


Cys 



10 NOV91c SNP data: 

NOV9lc has 3 SNP variants, whose variant positions for its nucleotide and amino 
acid sequences is numbered according to SEQ ID NOs: 333 and 334, respectively. The 
nucleotide sequence of the NOV91c variants differ as shown in Table DBF. 



Table DBF. cSNP and Coding Variants for NOV91c 


Variant 


Nucleotides 


Amino Acids 


Position Initial Modified 


Position | Initial Modified 
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13378928 


278 


A 


G 


67 


Glu 


Gly 


13380669 


1158 


A 


G 


360 


Glu 


Glu 


13378927 


1671 


G 


A 


0 







OTHER EMBODIMENTS 

5 Although particular embodiments have been disclosed herein in detail, this has been 

done by way of example for purposes of illustration only, and is not intended to be limiting 
with respect to the scope of the appended claims, which follow. In particular, it is 
contemplated by the inventors that various substitutions, alterations, and modifications may 
be made to the invention without departing from the spirit and scope of the invention as 

10 defined by the claims. The choice of nucleic acid starting material, clone of interest, or 

library type is believed to be a matter of routine for a person of ordinary skill in the art with 
knowledge of the embodiments described herein. Other aspects, advantages, and 
modifications considered to be within the scope of the following claims. The claims 
presented are representative of the inventions disclosed herein. Other, unclaimed inventions 

15 are also contemplated. Applicants reserve the right to pursue such inventions in later claims. 
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CLAIMS 

What is claimed is: 

1 . An isolated polypeptide comprising the mature form of an amino acid 
sequenced selected from the group consisting of SEQ ID NO:2n, wherein n is an integer 
between I and 172. 

2. An isolated polypeptide comprising an amino acid sequence selected from the 
group consisting of SEQ ID NO:2n, wherein n is an integer between 1 and 172. 

3. An isolated polypeptide comprising an amino acid sequence which is at least 
95% identical to an amino acid sequence selected from the group consisting of SEQ ID 
NO:2n, wherein n is an integer between 1 and 172. 

4. An isolated polypeptide, wherein the polypeptide comprises an amino acid 
sequence comprising one or more conservative substitutions in the amino acid sequence 
selected from the group consisting of SEQ ID NO:2n, wherein n is an integer between I and 
172. 

5. The polypeptide of claim I wherein said polypeptide is naturally occurring. 

6. A composition comprising the polypeptide of claim 1 and a carrier. 

7. A kit comprising, in one or more containers, the composition of claim 6. 

8. The use of a therapeutic in the manufacture of a medicament for treating a 
syndrome associated with a human disease, the disease selected from a pathology associated 
with the polypeptide of claim 1, wherein the therapeutic comprises the polypeptide of claim 
1. 

9. A method for determining the presence or amount of the polypeptide of claim 
1 in a sample, the method comprising: 
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(a) providing said sample; 

(b) introducing said sample to an antibody that binds immunospecifically to the 
polypeptide; and 

(c) determining the presence or amount of antibody bound to said polypeptide, 
thereby determining the presence or amount of polypeptide in said sample. 

10. A method for determining the presence of or predisposition to a disease 
associated with altered levels of expression of the polypeptide of claim 1 in a first 
mammalian subject, the method comprising: 

a) measuring the level of expression of the polypeptide in a sample from the first 
mammalian subject; and 

b) comparing the expression of said polypeptide in the sample of step (a) to the 
expression of the polypeptide present in a control sample from a second 
mammalian subject known not to have, or not to be predisposed to, said 
disease, 

wherein an alteration in the level of expression of the polypeptide in the first subject as 
compared to the control sample indicates the presence of or predisposition to said disease. 

11. A method of identifying an agent that binds to the polypeptide of claim 1 , the 
method comprising: 

(a) introducing said polypeptide to said agent; and 

(b) determining whether said agent binds to said polypeptide. 

12. The method of claim 1 1 wherein the agent is a cellular receptor or a 
downstream effector. 

1 3. A method for identifying a potential therapeutic agent for use in treatment of a 
pathology, wherein the pathology is related to aberrant expression or aberrant physiological 
interactions of the polypeptide of claim I , the method comprising: 

(a) providing a cell expressing the polypeptide of claim I and having a property 
or function ascribable to the polypeptide; 

(b) contacting the cell with a composition comprising a candidate substance; and 

(c) determining whether the substance alters the property or function ascribable to 
the polypeptide; 
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whereby, if an alteration observed in ihe presence of the substance is not observed when the 
cell is contacted with a composition in the absence of the substance, the substance is 
identified as a potential therapeutic agent. 

14. A method for screening for a modulator of activity of or of latency or 
predisposition to a pathology associated with the polypeptide of claim 1, said method 
comprising: 

(a) administering a test compound to a test animal at increased risk for a 
pathology associated with the polypeptide of claim I , wherein said test animal 
recombinantly expresses the polypeptide of claim I ; 

(b) measuring the activity of said polypeptide in said test animal after 
administering the compound of step (a); and 

(c) comparing the activity of said polypeptide in said test animal with the activity 
of said polypeptide in a control animal not administered said polypeptide, 
wherein a change in the activity of said polypeptide in said test animal relative 
to said control animal indicates the test compound is a modulator activity of or 
latency or predisposition to, a pathology associated with the polypeptide of 
claim 1. 

1 5. The method of claim 14, wherein said test animal is a recombinant test animal 
that expresses a test protein transgene or expresses said transgene under the control of a 
promoter at an increased level relative to a wild-type test animal, and wherein said promoter 
is not the native gene promoter of said transgene. 

1 6. A method for modulating the activity of the polypeptide of claim 1 , the 
method comprising contacting a cell sample expressing the polypeptide of claim I with a 
compound that binds to said polypeptide in an amount sufficient to modulate the activity of 
the polypeptide. 

1 7. A method of treating or preventing a pathology associated with the 
polypeptide of claim 1, the method comprising administering the polypeptide of claim I to a 
subject in which such treatment or prevention is desired in an amount sufficient to treat or 
prevent the pathology in the subject. 
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1 8. The method of claim 1 7, wherein the subject is a human. 

19. A method of treating a pathological state in a mammal, the method comprising 
administering to the mammal a polypeptide in an amount that is sufficient to alleviate the 
pathological state, wherein the polypeptide is a polypeptide having an amino acid sequence at 
least 95% identical to a polypeptide comprising the amino acid sequence selected from the 
group consisting of SEQ ID NO:2n 5 wherein n is an integer between 1 and 1 72 or a 
biologically active fragment thereof. 

20. An isolated nucleic acid molecule comprising a nucleic acid sequence selected 
from the group consisting of SEQ ID NO:2n-l, wherein n is an integer between 1 and 172. 

2 1 . The nucleic acid molecule of claim 20, wherein the nucleic acid molecule is 
naturally occurring. *> 

22. A nucleic acid molecule, wherein the nucleic acid molecule differs by a single 
nucleotide from a nucleic acid sequence selected from the group consisting of SEQ ID NO: 
2n-I, wherein n is an integer between 1 and 172. 

23. An isolated nucleic acid molecule encoding the mature form of a polypeptide 
having an amino acid sequence selected from the group consisting of SEQ ID NO:2n, 
wherein n is an integer between 1 and 1 72. 

24. An isolated nucleic acid molecule comprising a nucleic acid selected from the 
group consisting of 2n-l, wherein n is an integer between I and 172. 

25. The nucleic acid molecule of claim 20, wherein said nucleic acid molecule 
hybridizes under stringent conditions to the nucleotide sequence selected from the group 
consisting of SEQ ID NO: 2n-l, wherein n is an integer between I and 1 72, or a complement 
of said nucleotide sequence. 

26. A vector comprising the nucleic acid molecule of claim 20. 
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27. The vector of claim 26, further comprising a promoter operably linked to said 
nucleic acid molecule. 

28. A cell comprising the vector of claim 26. 

29. An antibody that immunospecifically binds to the polypeptide of claim 1 . 

30. The antibody of claim 29, wherein the antibody is a monoclonal antibody. 

3 1 . The antibody of claim 29, wherein the antibody is a humanized antibody. 

32. A method for determining the presence or amount of the nucleic acid molecule 
of claim 20 in a sample, the method comprising: 

(a) providing said sample; 

(b) introducing said sample to a probe that binds to said nucleic acid molecule; 
and 

(c) determining the presence or amount of said probe bound to said nucleic acid 



thereby determining the presence or amount of the nucleic acid molecule in said sample. 

33. The method of claim 32 wherein presence or amount of the nucleic acid 
molecule is used as a marker for cell or tissue type. 

34. The method of claim 33 wherein the cell or tissue type is cancerous. 

35. A method for determining the presence of or predisposition to a disease 
associated with altered levels of expression of the nucleic acid molecule of claim 20 in a first 
mammalian subject, the method comprising: 

a) measuring the level of expression of the nucleic acid in a sample from the 
first mammalian subject; and 

b) comparing the level of expression of said nucleic acid in the sample of step (a) 
to the level of expression of the nucleic acid present in a control sample from a 
second mammalian subject known not to have or not be predisposed to, the 
disease; 



molecule, 



583 



■v 

WO 03/023002 ^J^/USO 2/28539 



wherein an alteration in the level of expression of the nucleic acid in the first subject as 
compared to the control sample indicates the presence of or predisposition to the disease. 

36. A method of producing the polypeptide of claim 1, the method comprising 
culturing a cell under conditions that lead to expression of the polypeptide, wherein said cell 
comprises a vector comprising an isolated nucleic acid molecule comprising a nucleic acid 
sequence selected from the group consisting of SEQ ID NO:2n-1 , wherein n is an integer 
between 1 and 172. 



37. The method of claim 36 wherein the cell is a bacterial cell. 



38. The method of claim 36 wherein the cell is an insect cell. 

39. The method of claim 36 wherein the cell is a yeast cell. 

40. The method of claim 36 wherein the cell is a mammalian cell. 



41 . A method of producing the polypeptide of claim 2, the method comprising 
culturing a cell under conditions that lead to expression of the polypeptide, wherein said cell 
comprises a vector comprising an isolated nucleic acid molecule comprising a nucleic acid 
sequence selected from the group consisting of SEQ ID NO:2n-l, wherein n is an integer 
between 1 and 172. 

42. The method of claim 41 wherein the cell is a bacterial cell. 



43. The method of claim 41 wherein the cell is an insect cell. 



44. The method of claim 41 wherein the cell is a yeast cell. 

45. The method of claim 41 wherein the cell is a mammalian cell. 
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